Job Summary
Responsibilities: Architectural Blueprinting: Design scalable and secure data platform blueprints (e.g., Lakehouse, Data Mesh, or Data Fabric) that support diverse AI workloads, including generative AI and classical machine learning. Building scalable, cloud-native storage and processing frameworks (data lakes, lakehouses) capable of handling massive datasets for model training AI Data Infrastructure Design: Develop specific architectures for AI-driven workflows, including feature stores, real-time data streaming (Kafka/Spark), and automated machine learning pipelines. Data Lifecycle Management: Oversee the end-to-end data lifecycle, from high-fidelity data acquisition and cleaning to preprocessing and model serving. Data Pipeline Automation: Creating end-to-end automated pipelines for data ingestion, cleaning, and feature engineering to reduce the time from data raw state to ML model input. Architecting systems that support streaming data (e.g., Kafka, Kinesis) for low-latency inference in applications like IoT, fraud detection, and customer experience Implementing strict governance, including metadata management, data lineage (tracking data origin), and quality monitoring to ensure "clean" data, preventing model failure. Governance & Ethics: Establish unified data governance frameworks that ensure security, privacy (GDPR/CCPA), and compliance while mitigating algorithmic bias. Stakeholder Collaboration: Act as the technical bridge between business leadership, data science teams, and IT infrastructure to align technology with strategic AI objectives. Security & Compliance: Embedding zero-trust principles, role-based access control (RBAC), and regulatory compliance (GDPR, HIPAA) directly into the data architecture. MLOps Collaboration: Working closely with data scientists and MLOps teams to integrate feature stores, model registries, and monitoring tools for continuous retraining Qualifications & Experience Bachelor’s or Master’s degree in Computer Science, Information Systems, Engineering, or a related field. 10–16 years of experience in data warehouse /Bigdata Data platform skills, with at least 3-5 years focused on AI/ML supporting infrastructure. Band – 4.2 /5.1 Deep expertise in cloud platforms like AWS, Azure, or Google Cloud, and big data technologies such as Apache Spark, ADF, Databricks, and Snowflake. Experience with data governance, security, and compliance standards. Excellent communication and stakeholder management skills. Keywords – Focus on strategy, blueprinting, and high-level integration. Architect in Data platform, Vector Databases(pinecone, PGvector, Oracle Vector DB etc.) , Data Lake houses (data brick, snowflake etc) & Knowledge Graphs, Data Mesh, Data Fabric, Lakehouse Architecture, Hub-and-Spoke, Lambda/Kappa
Key Responsibilities
2. To develop platform specific architecture solutions.
3. To act as an SME in guiding the team in delivering high quality delivery solutions adhering to client requirements/policies.
4. To effectively respond to RFPs.
5. To provide cost and pricing data, recovery principles, patterns and usage.
6. To effectively translate client requirements into technical solutions
7. To identity new opportunities for PaaS/SaaS Solutions across the cloud service providers space