Senior Apache Spark Technical Lead - Scala, Python Job Details

Senior Apache Spark Technical Lead - Scala, Python

India

Job Description

Senior Apache Spark Technical Lead - Scala, Python

Ambattur, Tamil Nadu

Job Summary

Job Description: Senior Data Engineer (PySpark / Dataproc / GCP)

We are looking for a Senior Data Engineer with strong hands-on expertise in Python (PySpark) and Google Cloud Dataproc to design, develop, and operate scalable data pipelines on Google Cloud Platform. This role focuses on building reliable, production-grade data solutions across batch and streaming use cases.

Key Responsibilities

Key Responsibilities

• Design, build, and optimize data pipelines using PySpark on Dataproc

• Develop performant, maintainable Spark jobs using Python, with a strong focus on reliability and cost efficiency

• Manage Dataproc clusters, including provisioning, tuning, autoscaling, and ephemeral cluster usage

• Design end-to-end data architectures from ingestion to analytics and downstream consumption

• Collaborate with data consumers, platform teams, and stakeholders to deliver scalable solutions

• Ensure data quality, observability, and operational excellence in production environments

Skill Requirements

Required Skills & Experience

Core Skills: PySpark & Dataproc

• Strong expertise in Python, with extensive hands-on experience using PySpark

• Deep experience developing, tuning, and optimizing Spark batch and streaming workloads

• Practical experience with Google Cloud Dataproc, including:

o Cluster lifecycle management

o Initialization actions and custom configurations

o Autoscaling policies and cost optimization

o Use of ephemeral clusters for job-based execution

• Solid understanding of Spark internals (execution plans, caching, partitions, joins, shuffles, checkpointing)

Google Cloud Platform (GCP)

• Strong working experience with core GCP services, including:

o BigQuery for analytics and data warehousing

o Google Cloud Storage (GCS) as a data lake

o Cloud Run for containerized data services and microservices

o Cloud SQL for relational and transactional workloads

o Pub/Sub for event-driven and streaming ingestion

• Familiarity with IAM, service accounts, and secure service-to-service communication

Programming Languages

• Advanced proficiency in Python for production data pipelines

• Experience with Scala and/or Java for Spark development is a plus

• Ability to write clean, testable, and well-documented code

Data Storage & Processing

• Proven experience designing data lakes on GCS, including:

o Partitioning strategies and lifecycle management

o Optimized file formats such as Parquet and Avro

• Strong experience integrating Spark pipelines with BigQuery

• Knowledge of data modeling concepts for analytics and reporting

Workflow Orchestration

• Experience orchestrating pipelines using:

o Apache Airflow (Cloud Composer), or

o Native Dataproc job submissions and workflow templates

• Familiarity with monitoring, alerting, retries, and dependency management

Data Pipeline Design

• Strong experience designing and developing end-to-end data pipelines

• Ability to build scalable, fault-tolerant, and maintainable systems

• Hands-on experience implementing data validation, error handling, logging, and monitoring

• Experience working with both batch and streaming processing patterns

Streaming & Event Driven Processing

• Hands-on experience with streaming data pipelines

• Practical understanding of event-based ingestion and near real-time processing

Other Requirements

1.Relevant certifications in apache spark, scala, or python are a plus

Information at a Glance

Why HCLTech?

At HCLTech, you'll supercharge your potential. You'll find your career. And you'll find your spark. All at a place that knows that helping its customers stay on top starts by putting its people first.

HCLTech is a global technology company, home to more than 226,300 people across 60 countries, delivering industry-leading capabilities centered around digital, engineering, cloud and AI, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Financial Services, Manufacturing, Life Sciences and Healthcare, Technology and Services, Telecom and Media, Retail and CPG, and Public Services. Consolidated revenues as of 12 months ending December 2025 totaled $14.5 billion.

23 Benefits At HCLTech, we believe in empowering our employees with comprehensive benefits that support their professional growth and enhance their well-being. When you sign up for a career with us, you gain access to: https://rmkcdn.successfactors.com/147eb21f/a701dca9-f32d-4fc9-9447-6.svg Industry-benchmarked compensation https://rmkcdn.successfactors.com/147eb21f/b0c54381-ddcc-4a33-9b35-9.svg Best-in-class healthcare benefits https://rmkcdn.successfactors.com/147eb21f/b73027be-7aae-4d36-a090-4.svg Personal time off https://rmkcdn.successfactors.com/147eb21f/d5b4fdfd-2e99-4e26-9878-9.svg Maternity and paternity benefits https://rmkcdn.successfactors.com/147eb21f/3d42b0fc-4652-435a-9ece-c.svg Access to skills / higher education programs/resources https://rmkcdn.successfactors.com/147eb21f/aeddeaf2-9e25-4584-ad11-d.svg Discounts on products and services via Benefit Box https://rmkcdn.successfactors.com/147eb21f/a9609a3b-2700-4b3c-9d90-a.svg Participate in CSR programs and live life with a purpose https://rmkcdn.successfactors.com/147eb21f/c6e33851-710f-4634-bd69-f.svg Opportunities to grow and advance your career Note: The benefits listed above vary depending on the nature of your employment and the country where you work. Some benefits may be available in some countries but not in all.

Provider	Description	Enabled
Vimeo	Vimeo is a video hosting, sharing, and services platform focused on the delivery of video. Opting out of Vimeo cookies will disable your ability to watch or interact with Vimeo videos. Cookie Policy Privacy Policy Terms and Conditions	Consent to cookies from provider Vimeo
YouTube	YouTube is a video-sharing service where users can create their own profile, upload videos, watch, like, and comment on videos. Opting out of YouTube cookies will disable your ability to watch or interact with YouTube videos. Cookie Policy Privacy Policy Terms and Conditions	Consent to cookies from provider YouTube

Provider	Description	Enabled
Google Analytics	Google Analytics is a web analytics service offered by Google that tracks and reports website traffic. Cookie Information Privacy Policy Terms and Conditions	Consent to cookies from provider GoogleAnalytics
Google Tag Manager	Google Tag Manager is a tag management system for conversion tracking, site analytics, remarketing, and more. Privacy Policy Terms and Conditions	Consent to cookies from provider GoogleTagManager
LinkedIn	LinkedIn is an employment-oriented social networking service. We use the Apply with LinkedIn feature to allow you to apply for jobs using your LinkedIn profile. Opting out of LinkedIn cookies will disable your ability to use Apply with LinkedIn. Cookie Policy Cookie Table Privacy Policy Terms and Conditions	Consent to cookies from provider LinkedIn