Job Summary
1. Core Programming
o Python , Scala , Apache Spark (DataFrames, Spark SQL, performance tuning)
o SQL (advanced joins, window functions, query tuning)
o ADO adherence
o Basics of Java – Good to have
2. Data Modeling & Databases
o Data warehousing concepts: star/snowflake schemas, facts & dimensions
o Data modelling & mapping understanding
3. ETL / ELT & Data Pipelines
o Good understanding on ETL & data processing
o Designing batch and streaming pipelines
o Data integration - files, message queues etc
o Hadoop ecosystem (HDFS, Hive) ;Distributed computing concepts (partitioning, shuffling etc)
4. Data Quality & Governance
o Data validation, profiling, and monitoring
o DQ Controls and framework alignment
o Basic knowledge of data governance, security, and compliance controls
5. DevOps & Engineering Practices
o Version control and branching strategies
o Automated builds, tests and deployments; Pipeline-as-code (e.g. YAML-based pipelines)
o Managing artefacts, versioning and rollbacks
6. Production Deployment and Release Management Activities
o Release Planning & Coordination; Code Validation & Post Deployment Checks
o Rollback & Incident Handling
o Continuous Improvement of Release Process
Key Responsibilities
2. To develop and guide the team members in enhancing their technical capabilities and increasing productivity
3. To ensure process compliance in the assigned module| and participate in technical discussions/review as a technical consultant for feasibility study (technical alternatives, best packages, supporting architecture best practices, technical risks, breakdown into components, estimations).
4. To prepare and submit status reports for minimizing exposure and risks on the project or closure of escalations.