Job Summary
1. Core Programming
o Python , Scala , Apache Spark (DataFrames, Spark SQL, performance tuning)
o SQL (advanced joins, window functions, query tuning)
o ADO adherence
o Basics of Java – Good to have
2. Data Modeling & Databases
o Data warehousing concepts: star/snowflake schemas, facts & dimensions
o Data modelling & mapping understanding
3. ETL / ELT & Data Pipelines
o Good understanding on ETL & data processing
o Designing batch and streaming pipelines
o Data integration - files, message queues etc
o Hadoop ecosystem (HDFS, Hive) ;Distributed computing concepts (partitioning, shuffling etc)
4. Data Quality & Governance
o Data validation, profiling, and monitoring
o DQ Controls and framework alignment
o Basic knowledge of data governance, security, and compliance controls
5. DevOps & Engineering Practices
o Version control and branching strategies
o Automated builds, tests and deployments; Pipeline-as-code (e.g. YAML-based pipelines)
o Managing artefacts, versioning and rollbacks
6. Production Deployment and Release Management Activities
o Release Planning & Coordination; Code Validation & Post Deployment Checks
o Rollback & Incident Handling
o Continuous Improvement of Release Process
Key Responsibilities
2. To support as an Subject Matter Expert
3. To ensure knowledge up-gradation and work with new technologies so that the solution is current and meets quality standards and the client requirements
4. Ensuring a sufficient pool of skilled professionals in the designated technology, through activities such as conducting interviews, providing training sessions and offering mentorship.
5. To gather specifications and deliver solutions to the client organization based on understanding of a domain or technology.
6. To support competency development with envisioning and articulating propositions â building collaterals/ whitepaper creation, market trend analysis etc.
7. To recommend client value creation initiatives and implement industry best practices (on specific technology/product)