Job Summary
Data Engineer who is able to:
- design, build, and automate ETL pipelines that ingest data from multiple sources (e.g., decoding feeds) and deliver reproducible, model-ready datasets
- build and maintain data warehouses and lakes, plus all supporting database structures
- provision and run the data infrastructure for storage, scheduling, and orchestration (Airflow) on both GCP and on-prem systems
- embed rigorous validation, monitoring, logging, and governance to meet GDPR and EU AI Act requirements
- collaborate with AI engineers to supply high-quality datasets on time
Key Skills: Data Engineer, AWS, GCP, Google Cloud Platform, Hybrid, Infrastructure, On-prem, ETL, ELT, Data Pipelines, Data Warehousing, Data Lakes, Airflow, Terraform, Python, SQL, Spark, Pandas, Infrastructure-as-Code, IaC, Pulumi, Ansible, CI/CD, GitHub Actions, Jenkins, Containerization, GDPR, Parquet, Iceberg, SNS, Pub/Sub, Data Validation, IAM.
Screening Criteria:
- What data engineering tools or platforms have you used most frequently?
- How comfortable are you with SQL, and what kind of queries do you typically write?
- How do you make sure data pipelines are reliable or data quality issues are caught?
- Who do you typically work with — analysts, data scientists, software engineers