Job Summary
Key Responsibilities:
- Design and implement data pipelines using best practices and industry-leading tools like Databricks and Azure Data Factory (ADF)
- Extract, transform, and load large datasets from various sources, ensuring data quality and integrity
- Utilize Python and Spark to perform complex data manipulations and aggregations
- Write optimized SQL queries to interact with relational databases and Cosmos DB
- Integrate data pipelines with APIs and external systems using efficient methods
- Monitor and maintain data pipelines, ensuring smooth operation and identifying potential issues
- Collaborate with data scientists, analysts, and engineers to understand data needs and deliver valuable insights
Key Responsibilities
2. Design and develop efficient and reliable etl processes for large datasets.
3. Collaborate with cross functional teams to understand business requirements and translate them into technical solutions.
4. Optimize data workflows, troubleshoot issues, and ensure data quality and integrity.
5. Implement best practices for data security, governance, and compliance.
6. Provide technical guidance, mentoring, and support to junior team members.
7. Stay uptodate with the latest trends and technologies in data engineering and analytics.
Skill Requirements
- 3 to 7 years of experience with IT / Azure, Python, SQL, Kafka, NoSQL, .NET C#, Snowflake, IICS
- Bachelor’s degree in computer science, Information Technology, or related STEM fields, or equivalent experience.
- Good understanding Massively parallel processing (MPP) systems, experience building Datawarehouse/DataMart on Azure Synapse SQL pools (SQL DW)
- Strong SQL skills and experience writing complex yet efficient SPROCs/Functions/Views using T-SQL
- Solid understand of spark architecture and experience with performance tuning big data workloads in spark
- Building complex data transformations on both structure and semi-structured data (XML/JSON) using Pyspark & SQL, refactoring tradition ML model to run on spark framework
- Familiarity with Cognitive Search/Elastic search and its use cases & building integrations to load data to Search services
-
- Familiarity with Azure Databricks environment and deploying spark code in databricks cluster
- Good understanding of No SQL and its use case, Modelling No SQL schemas & containers, building integration to read/write to cosmos
- Good understanding on distributed systems and experience building real-time integrations with Kafka
- Good understanding of Azure cloud ecosystem; Azure data certification of DP-200/201/203 will be an advantage
- Proficient with Visual Studio 19+, IntelliJ/eclipse and source control using GIT
- Good understanding of Agile, DevOps and CI-CD automated deployment (e.g. Azure DevOps, Jenkins)
- Good knowledge on Microservices architecture and any experience in building microservices with .NET Core WebAPI will be an advantage
- Any experience building rest-full services using Python FAST API will be an advantage
- Experience with Snowflake data platform, including data loading, transformation, and query optimization
Other Requirements
1.Relevant certifications in Azure Data Factory, Azure Databricks, SQL, Oracle, or Python would be a plus.
Pyspark, Kabana, sql and no sql. Good to have .net also