Senior Technical Lead
United States
Job Description
Senior Technical Lead
Santa Clara, California

Job Summary

We are seeking a highly motivated Data Platform Engineer with expertise in Big Data technologies, Cloud Platforms, Kubernetes, and Site Reliability Engineering (SRE) to design, build, manage, and optimize large-scale distributed data platforms. The ideal candidate will have strong experience supporting Hadoop and Spark ecosystems, managing cloud-based infrastructure, improving platform reliability, and enabling real-time data processing at scale.

Key Responsibilities

Key Responsibilities
Big Data Platform Administration

Design, deploy, and manage large-scale Hadoop clusters supporting enterprise data workloads.

Administer and optimize Hadoop ecosystem components including HDFS, YARN, Hive, HBase, Spark, Kafka, and Oozie.

Ensure high availability, scalability, and performance of distributed data platforms.

Support data ingestion, storage, processing, and analytics workloads across batch and streaming environments.
Spark & Data Processing

Develop, optimize, and troubleshoot Spark (PySpark) applications for large-scale batch and streaming workloads.

Perform Spark performance tuning through partitioning, caching, memory optimization, and executor configuration.

Analyze and resolve Spark job failures, resource bottlenecks, and performance issues.

Support Spark Streaming applications integrated with Kafka.
Cloud & Infrastructure Engineering

Build and manage cloud infrastructure on AWS using services such as EC2, S3, IAM, VPC, CloudWatch, RDS, and related services.

Automate infrastructure provisioning and management using Terraform and Infrastructure-as-Code principles.

Implement scalable, secure, and resilient cloud architectures.
Kubernetes & Containerization

Containerize applications using Docker and deploy workloads on Kubernetes.

Manage Kubernetes clusters, deployments, services, pods, and Helm charts.

Monitor cluster health, troubleshoot workload issues, and optimize resource utilization.

Support Spark and data workloads running on Kubernetes platforms.
Site Reliability Engineering (SRE)

Implement SRE best practices to improve platform reliability, availability, and operational efficiency.

Define and monitor SLIs, SLOs, and reliability metrics.

Participate in incident response, root cause analysis (RCA), and post-incident reviews.

Drive continuous improvements to reduce MTTR and prevent recurring issues.
Monitoring & Observability

Build and maintain monitoring, alerting, and logging solutions using:
o
Grafana
o
ELK Stack
o
Splunk
o
CloudWatch
o
Nagios
o
Datadog

Create dashboards and proactive alerting mechanisms to ensure system health and performance.
DevOps & Automation

Develop CI/CD pipelines using Jenkins, Git, and related DevOps tools.

Automate operational tasks using Python and Shell scripting.

Implement deployment automation and configuration management practices.
Security & Governance

Implement security controls using IAM, RBAC, Apache Ranger, and network security best practices.

Ensure compliance with organizational security and governance standards.

Support secure access management and data protection initiatives.

Skill Requirements

Bachelor's or Master's degree in Computer Science, Information Technology, Data Engineering, or related field.

5+ years of experience in Data Engineering, Platform Engineering, SRE, or Big Data Administration.

Strong experience with:
o
Hadoop Ecosystem (HDFS, YARN, Hive, HBase, Oozie)
o
Apache Spark (PySpark, Spark Streaming)
o
Apache Kafka
o
Kubernetes and Docker
o
AWS Cloud Platform
o
Terraform
o
Linux Administration

Experience with monitoring and observability platforms.

Strong troubleshooting and incident management skills.

Proficiency in Python and Shell scripting.

Technical Skills
Cloud Platforms

AWS (EC2, S3, IAM, VPC, CloudWatch, RDS)

Microsoft Azure
Big Data Technologies

Hadoop (HDFS, YARN, MapReduce)

Apache Spark (PySpark, Spark Streaming)

Kafka

Hive

HBase

Apache Druid

Oozie
Container & Orchestration

Docker

Kubernetes

Helm

DevOps & Automation

Terraform

Jenkins

Git

Chef
Monitoring & Logging

Grafana

ELK Stack

Splunk

CloudWatch

Nagios

Datadog
Programming

Python

Shell Scripting
Operating Systems

Linux (RHEL, CentOS, Ubuntu)

Unix
Methodologies

Agile/Scrum

DevOps

Site Reliability Engineering (SRE)

Other Requirements

Experience with Apache Druid for real-time analytics.

Exposure to Azure cloud services.

Experience implementing SRE frameworks and reliability engineering practices.

Knowledge of CI/CD and DevOps methodologies.

Experience supporting large-scale production data platforms.

Maximum Salary (US):  148000
Minimum Salary (US):  78000
Information at a Glance

Why HCLTech?

At HCLTech, you'll supercharge your potential. You'll find your career. And you'll find your spark. All at a place that knows that helping its customers stay on top starts by putting its people first.

HCLTech is a global technology company, home to more than 226,300 people across 60 countries, delivering industry-leading capabilities centered around digital, engineering, cloud and AI, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Financial Services, Manufacturing, Life Sciences and Healthcare, Technology and Services, Telecom and Media, Retail and CPG, and Public Services. Consolidated revenues as of 12 months ending December 2025 totaled $14.5 billion.

Compensation and Benefits

A candidate’s pay within the range will depend on their skills, experience, education, and other factors permitted by law. This role may also be eligible for performance-based bonuses subject to company policies. In addition, this role is eligible for the following benefits subject to company policies: medical, dental, vision, pharmacy, life, accidental death & dismemberment, and disability insurance; employee assistance program; 401(k) retirement plan; 10 days of paid time off per year (some positions are eligible for need-based leave with no designated number of leave days per year); and 10 paid holidays per year.