Job Summary
The Data Quality Engineer (Data QE) plays a critical role in validating data pipelines, lakehouse components, and end‑to‑end data flows across the Azure ecosystem. This role ensures that data delivered by the platform is accurate, consistent, reliable, and aligned with business expectations. Unlike Data Engineers who build pipelines, the QE focuses on verifying correctness, performance, and quality through automated and manual testing frameworks.
Key Responsibilities
- Validate Azure Databricks notebooks, PySpark jobs, and Spark-based transformations.
- Validate data workflows across ADLS layers (Raw/Bronze/Silver/Gold).
- Test batch and streaming data pipelines for completeness, accuracy, and performance.
- Develop automated data quality tests using PySpark and SQL within Databricks.
- Perform source-to-target data reconciliation for large datasets.
- Conduct data profiling to identify anomalies, defects, and data quality gaps.
- Validate SQL transformations, aggregations, KPIs, and dimensional models.
- Test data warehousing concepts including SCD, fact/dimension structures, and star schemas.
- Validate ADF pipelines, orchestrations, triggers, and integration with Databricks.
- Verify streaming ingestion (Event Hubs/Kafka), checkpointing, and micro‑batch processing.
- Log and track defects with clear reproduction steps and root-cause analysis.
- Generate test plans, test cases, and data validation documentation.
- Collaborate with Data Engineers, Architects, and Product Owners to ensure quality coverage.
Skill Requirements
- Databricks (QE-focused testing & validation)
- PySpark (mandatory for writing validations and automated tests)
- SQL (advanced querying & validation)
- Data Warehousing (SCD, star schema, facts, dimensions, KPI logic)
- Strong understanding of Azure data ecosystem (ADF, ADLS, batch & streaming)
- Hands-on experience validating Spark architectures and distributed data processing