Job Summary
1. Python( Expert): Must for every one.Strong understanding of OOP concepts, functional programming, decorators, and design patterns.Ability to write clean, Pythonic code while adhering to PEP standards.2. Python Packaging & Build ( Expert level) : Most crucial to select the candidate.Knowledge of different Python project structures (e.g., Kedro as a reference, though not standard).Experience in building projects with dependency management (e.g., requirements.txt with fixed and variable versions).Expertise in setup.py, pyproject.toml, and publishing packages for reuse across teams.Ability to package applications and create Docker images for portable execution.Good understanding of CI/CD pipelines in Azure (other CI/CD tools are also acceptable).3. PySpark (Medium to Advanced) . Must for senior positionDeep understanding of Spark internals, including transformations, handling large datasets, complex joins, and memory management.Hands-on experience in writing and debugging code at the cluster level.Strong knowledge of Spark optimization techniques, including practical implementation (especially related to memory optimization).4. MLOps Implementation (Focus on “how” and “why,” not just “what”) : This is crucial for the selection process. Any simple mistake could change the prospective.ML System Architecture: Batch and real-time (API-based) systems.Pipelines: Clear understanding of training and scoring workflows, including all key steps.Monitoring: Knowledge of key metrics required post-deployment, especially for classification and regression models.Deployment Strategy: Clear understanding of prioritization when deploying models under time constraints—what must be done immediately vs. what can be deferred.Understand the relevant tool in the entire pipeline to solve business challenges.Optional5 . GCP (Good to have; prior any cloud experience is mandatory)Vertex AI (pipelines, models, experiments, feature store, training, benchmarking).DataProc, BigQuery, Artifact Registry, Monitoring, and GCS.6 . BigQuery / BigQuery ML (Good to have)Deep understanding of BigQuery internals and optimization techniques.Ability to efficiently read and write data to/from BigQuery.Understanding of BigQuery ML workflows and their integration with Python (optional but increasingly relevant).
Key Responsibilities
2. Design and optimize algorithms for data processing, feature engineering, and pattern recognition.
3. Lead data exploration, mining, and visualization to uncover trends and actionable insights.
4. Collaborate with cross-functional teams to integrate data science solutions into business strategies.
5. Drive innovation by leveraging advanced statistical techniques, deep learning, and big data technologies.
6. Ensure data integrity, governance, and best practices in model development and deployment.
7. Mentor junior data scientists and promote a data-driven culture within the organization.
8. Stay ahead of industry trends and emerging technologies to enhance analytical capabilities.