Job Summary
Role Summary
We are seeking a highly skilled Data Engineer with strong expertise in the Hadoop ecosystem to design, build, and maintain scalable data solutions. The ideal candidate will work on large-scale data platforms and support analytics
Key Responsibilities
Key Responsibilities
- Design, develop, and maintain scalable data pipelines and data processing systems.
- Work with distributed frameworks using Hadoop (HDFS, YARN).
- Develop high-performance data processing jobs using Apache Spark.
- Create, manage, and optimize Hive-based data warehouse solutions.
- Build and maintain ETL/ELT pipelines for structured and unstructured data.
- Write efficient and scalable code using Scala or Java.
- Deploy, monitor, and manage solutions in AWS cloud environment (S3, EMR, Glue, etc.).
- Ensure data quality, governance, and performance optimization.
- Collaborate with cross-functional teams including Data Scientists and Business stakeholders.
Skill Requirements
Mandatory Skills
- Hands-on experience with:
- Hadoop (HDFS, Hive)
- Apache Spark
- Strong programming skills in Scala or Java
- Experience working with AWS services (S3, EMR, Glue or similar)
- Strong SQL and data processing skills
- Good understanding of distributed computing concepts
Other Requirements
- Working knowledge of Python
- Exposure to AI/ML concepts or data pipelines supporting machine learning models
- Experience with streaming tools (Kafka, Spark Streaming)
- Familiarity with orchestration tools like Airflow/Oozie