Job Summary
1. Core Programming
o Python , Scala , Apache Spark (DataFrames, Spark SQL, performance tuning)
o SQL (advanced joins, window functions, query tuning)
o ADO adherence
o Basics of Java – Good to have
2. Data Modeling & Databases
o Data warehousing concepts: star/snowflake schemas, facts & dimensions
o Data modelling & mapping understanding
3. ETL / ELT & Data Pipelines
o Good understanding on ETL & data processing
o Designing batch and streaming pipelines
o Data integration - files, message queues etc
o Hadoop ecosystem (HDFS, Hive) ;Distributed computing concepts (partitioning, shuffling etc)
4. Data Quality & Governance
o Data validation, profiling, and monitoring
o DQ Controls and framework alignment
o Basic knowledge of data governance, security, and compliance controls
5. DevOps & Engineering Practices
o Version control and branching strategies
o Automated builds, tests and deployments; Pipeline-as-code (e.g. YAML-based pipelines)
o Managing artefacts, versioning and rollbacks
6. Production Deployment and Release Management Activities
o Release Planning & Coordination; Code Validation & Post Deployment Checks
o Rollback & Incident Handling
o Continuous Improvement of Release Process
Key Responsibilities
2. To conduct comprehensive code reviews, establish and oversee quality assurance processes, performance optimization , implementation of best practices and coding standards to ensure successful delivery of complex projects.
3. To ensure process compliance in the assigned module| and participate in technical discussions/review as a technical consultant for feasibility study (technical alternatives, best packages, supporting architecture best practices, technical risks, breakdown into components, estimations).
4. To collaborate with stakeholders to define project scope, objectives, deliverables and accordingly prepare and submit status reports for minimizing exposure & closure of escalations.