Job Summary
Years of ExpLocation – Pan India or any constraintHyderabadProject type (Support / Dev / Maintenance / Testing)DevShift involved – If yes please share timing.NoCustomer interview involved – yes / NoNoSalary Range SkillsMandatoryProficiency Level (1-5) GCP Data Orchestration (Composer/Airflow)Yes5Real-time Processing (Dataflow/Pub/Sub)Yes5BigQuery & BigQuery ML (BQML)Yes5Generative AI Integration (Gemini/AI Studio)Yes4Data Governance & SecurityYes5Automated MLOps Pipelines (Vertex AI)Yes4Responsibilities:Data Pipeline Development: Design, build, and maintain scalable, efficient, and reliable ETL/ELT data pipelines for batch and real-time processing using GCP services (e.g., Dataflow, Dataproc, Cloud Composer, Pub/Sub).
AI/ML Data Preparation: Collaborate closely with Data Scientists and Machine Learning Engineers to understand data requirements for model training, evaluation, and serving. Prepare, transform, and curate large, diverse datasets (structured, unstructured, streaming) to optimize them for AI/ML workloads.
GCP Ecosystem Expertise: Leverage a wide range of GCP data and AI/ML services, including:
Data Warehousing & Storage: BigQuery (for analytics and BigQuery ML), Cloud Storage, Cloud SQL, Cloud Bigtable.
Data Processing: Dataflow, Dataproc (Spark, Hadoop), Cloud Composer (Apache Airflow), Data Fusion.
AI/ML Services: Gemini AI, AI Studio, Vertex AI (for model training, deployment, MLOps, Pipelines, Workbench, AutoML). Should be able to create prompts and create build applications using AI studio.
Data Governance & Quality: Implement and enforce data quality, security, and governance standards throughout the data lifecycle, ensuring data accuracy, consistency, and compliance with regulations.
Performance Optimization: Monitor, troubleshoot, and optimize the performance and cost-effectiveness of data pipelines and AI/ML infrastructure.
Automation & MLOps: Automate data processes, develop CI/CD pipelines for data and ML models, and contribute to MLOps best practices for seamless deployment and monitoring of AI/ML solutions.
Collaboration & Communication: Work effectively with cross-functional teams, including Data Scientists, Analysts, Software Engineers, and Product Managers, to understand data needs and deliver impactful solutions.
Innovation & Research: Stay up-to-date with the latest advancements in data engineering, AI/ML, and GCP technologies, continuously exploring and recommending new tools and approaches.