Job Summary
To build, operate, and continuously improve the data pipelines, retrieval infrastructure, and ML/LLMOps foundations that power our AI initiatives. The resource will work on turning reference architectures and data contracts into robust, production-grade implementations that serve conversational AI assistants, dashboard copilots, autonomous agents, RAG applications, and predictive ML models.
1.Data Pipeline Engineering
2. RAG, Vector & Retrieval Infrastructure
3. Semantic Layer & Knowledge Infrastructure
4. ML/LLMOps Pipeline Support
5. Agentic Data Infrastructure
6. Governance, Security & Data Quality
Have Experience: ▪ 5–8+ years data engineering; 2+ years production AI/ML or LLM-era data infrastructure. ▪ Proven experience building production pipelines at scale — batch and streaming, Snowflake,AWS/Azure. ▪ Deep expertise: Python, PySpark, Snowflake, Delta Lake, Kafka, Spark Structured Streaming. ▪ Hands-on with vector stores, embedding pipelines, and retrieval infrastructure in production RAG environments. ▪ Working knowledge of MLOps: MLflow, CI/CD for AI, automated evaluation, and production monitoring. ▪ Strong grounding in data governance, quality frameworks, and compliance-aligned engineering.
Technical Skills:
Expert-Python, SQL, PySpark, Kafka, Delta Lake, AWS (S3, Glue, Kinesis, EKS, Redshift), Docker, Kubernetes, GitHub Actions, Snowflake
Strong- LangChain, LlamaIndex, LLM APIs (OpenAI, Bedrock, Claude, HuggingFace), Pinecone, FAISS, ChromaDB, OpenSearch, MLflow, FastAPI, Neo4j
Solid- CI/CD pipelines, CloudWatch, Grafana, data lineage platforms, MCP
Familiar- LangGraph, prompt engineering, RLHF dataset prep, LLM fine-tuning workflows
TECH STACK ▪ Delta Lake · PySpark · Kafka · Spark Structured Streaming · Snowflake · AWS (S3, Glue, EKS, Bedrock, Kinesis, Redshift, Lambda) · Azure · Kubernetes · Docker · Terraform · GitHub Actions · Jenkins · MLflow · LangChain · LlamaIndex · HuggingFace · OpenAI · AWS Bedrock · Claude · Pinecone · FAISS · ChromaDB · OpenSearch · Neo4j · FastAPI · Python · SQL · MCP · LangGraph · MLOps · CI/CD · Grafana / CloudWatch
Key Responsibilities
2. Architect and implement RESTful APIs to integrate LLM models and vector databases (e.g., Pinecone, PostgreSQL, AzureAISearch) for scalable and efficient data retrieval in AI applications.
3. Optimize database schemas and embeddings using PostgreSQL and VectorDB to enhance performance and accuracy of generative AI systems.
4. Oversee code quality and performance by conducting comprehensive code reviews and enforcing best practices in Python, RESTful API development, and prompt engineering.
5. Lead technical feasibility studies and solution breakdowns, evaluating architecture alternatives and technical risks for GenAI project modules.
6. Collaborate with internal stakeholders to define technical objectives, deliverables, and ensure process compliance in the development and deployment of AI-powered solutions.
Skill Requirements
2. Solid Expertise In Python Programming, Including Frameworks Such As Flask, Django, And Fastapi.
3. Indepth Knowledge Of Restful Api Design And Implementation For Ai Integrations.
4. Advanced Skills In Database Management Using Postgresql, Mysql, And Vectordb Technologies (E.G., Pinecone, Azureaisearch).
5. Strong Understanding Of Embedding Techniques And Their Application In Generative Ai Workflows.
6. Experience In Optimizing Code Quality, Performance, And Scalability Of Aidriven Applications.
Other Requirements
2. Certifications such as TensorFlow Developer Certificate
3. - Microsoft Azure AI Engineer Associat