Senior Technical Lead Job Details

Senior Technical Lead

United States

Job Description

Senior Technical Lead

Santa Clara, California

Job Summary

We are seeking a highly motivated Data Platform Engineer with expertise in Big Data technologies, Cloud Platforms, Kubernetes, and Site Reliability Engineering (SRE) to design, build, manage, and optimize large-scale distributed data platforms. The ideal candidate will have strong experience supporting Hadoop and Spark ecosystems, managing cloud-based infrastructure, improving platform reliability, and enabling real-time data processing at scale.

Key Responsibilities

Key Responsibilities
Big Data Platform Administration
•
Design, deploy, and manage large-scale Hadoop clusters supporting enterprise data workloads.
•
Administer and optimize Hadoop ecosystem components including HDFS, YARN, Hive, HBase, Spark, Kafka, and Oozie.
•
Ensure high availability, scalability, and performance of distributed data platforms.
•
Support data ingestion, storage, processing, and analytics workloads across batch and streaming environments.
Spark & Data Processing
•
Develop, optimize, and troubleshoot Spark (PySpark) applications for large-scale batch and streaming workloads.
•
Perform Spark performance tuning through partitioning, caching, memory optimization, and executor configuration.
•
Analyze and resolve Spark job failures, resource bottlenecks, and performance issues.
•
Support Spark Streaming applications integrated with Kafka.
Cloud & Infrastructure Engineering
•
Build and manage cloud infrastructure on AWS using services such as EC2, S3, IAM, VPC, CloudWatch, RDS, and related services.
•
Automate infrastructure provisioning and management using Terraform and Infrastructure-as-Code principles.
•
Implement scalable, secure, and resilient cloud architectures.
Kubernetes & Containerization

Containerize applications using Docker and deploy workloads on Kubernetes.
•
Manage Kubernetes clusters, deployments, services, pods, and Helm charts.
•
Monitor cluster health, troubleshoot workload issues, and optimize resource utilization.
•
Support Spark and data workloads running on Kubernetes platforms.
Site Reliability Engineering (SRE)
•
Implement SRE best practices to improve platform reliability, availability, and operational efficiency.
•
Define and monitor SLIs, SLOs, and reliability metrics.
•
Participate in incident response, root cause analysis (RCA), and post-incident reviews.
•
Drive continuous improvements to reduce MTTR and prevent recurring issues.
Monitoring & Observability
•
Build and maintain monitoring, alerting, and logging solutions using:
o
Grafana
o
ELK Stack
o
Splunk
o
CloudWatch
o
Nagios
o
Datadog
•
Create dashboards and proactive alerting mechanisms to ensure system health and performance.
DevOps & Automation
•
Develop CI/CD pipelines using Jenkins, Git, and related DevOps tools.
•
Automate operational tasks using Python and Shell scripting.
•
Implement deployment automation and configuration management practices.
Security & Governance
•
Implement security controls using IAM, RBAC, Apache Ranger, and network security best practices.
•
Ensure compliance with organizational security and governance standards.
•
Support secure access management and data protection initiatives.

Skill Requirements

Bachelor's or Master's degree in Computer Science, Information Technology, Data Engineering, or related field.
•
5+ years of experience in Data Engineering, Platform Engineering, SRE, or Big Data Administration.
•
Strong experience with:
o
Hadoop Ecosystem (HDFS, YARN, Hive, HBase, Oozie)
o
Apache Spark (PySpark, Spark Streaming)
o
Apache Kafka
o
Kubernetes and Docker
o
AWS Cloud Platform
o
Terraform
o
Linux Administration
•
Experience with monitoring and observability platforms.
•
Strong troubleshooting and incident management skills.
•
Proficiency in Python and Shell scripting.

Technical Skills
Cloud Platforms
•
AWS (EC2, S3, IAM, VPC, CloudWatch, RDS)
•
Microsoft Azure
Big Data Technologies
•
Hadoop (HDFS, YARN, MapReduce)
•
Apache Spark (PySpark, Spark Streaming)
•
Kafka
•
Hive
•
HBase
•
Apache Druid
•
Oozie
Container & Orchestration
•
Docker
•
Kubernetes
•
Helm

DevOps & Automation
•
Terraform
•
Jenkins
•
Git
•
Chef
Monitoring & Logging
•
Grafana
•
ELK Stack
•
Splunk
•
CloudWatch
•
Nagios
•
Datadog
Programming

Python
•
Shell Scripting
Operating Systems
•
Linux (RHEL, CentOS, Ubuntu)
•
Unix
Methodologies
•
Agile/Scrum
•
DevOps
•
Site Reliability Engineering (SRE)

Other Requirements

Experience with Apache Druid for real-time analytics.
•
Exposure to Azure cloud services.
•
Experience implementing SRE frameworks and reliability engineering practices.
•
Knowledge of CI/CD and DevOps methodologies.
•
Experience supporting large-scale production data platforms.

Maximum Salary (US): 148000

Minimum Salary (US): 78000

Information at a Glance

Why HCLTech?

At HCLTech, you'll supercharge your potential. You'll find your career. And you'll find your spark. All at a place that knows that helping its customers stay on top starts by putting its people first.

HCLTech is a global technology company, home to more than 226,300 people across 60 countries, delivering industry-leading capabilities centered around digital, engineering, cloud and AI, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Financial Services, Manufacturing, Life Sciences and Healthcare, Technology and Services, Telecom and Media, Retail and CPG, and Public Services. Consolidated revenues as of 12 months ending December 2025 totaled $14.5 billion.

Compensation and Benefits

A candidate’s pay within the range will depend on their skills, experience, education, and other factors permitted by law. This role may also be eligible for performance-based bonuses subject to company policies. In addition, this role is eligible for the following benefits subject to company policies: medical, dental, vision, pharmacy, life, accidental death & dismemberment, and disability insurance; employee assistance program; 401(k) retirement plan; 10 days of paid time off per year (some positions are eligible for need-based leave with no designated number of leave days per year); and 10 paid holidays per year.

Provider	Description	Enabled
Vimeo	Vimeo is a video hosting, sharing, and services platform focused on the delivery of video. Opting out of Vimeo cookies will disable your ability to watch or interact with Vimeo videos. Cookie Policy Privacy Policy Terms and Conditions	Consent to cookies from provider Vimeo
YouTube	YouTube is a video-sharing service where users can create their own profile, upload videos, watch, like, and comment on videos. Opting out of YouTube cookies will disable your ability to watch or interact with YouTube videos. Cookie Policy Privacy Policy Terms and Conditions	Consent to cookies from provider YouTube

Provider	Description	Enabled
Google Analytics	Google Analytics is a web analytics service offered by Google that tracks and reports website traffic. Cookie Information Privacy Policy Terms and Conditions	Consent to cookies from provider GoogleAnalytics
Google Tag Manager	Google Tag Manager is a tag management system for conversion tracking, site analytics, remarketing, and more. Privacy Policy Terms and Conditions	Consent to cookies from provider GoogleTagManager
LinkedIn	LinkedIn is an employment-oriented social networking service. We use the Apply with LinkedIn feature to allow you to apply for jobs using your LinkedIn profile. Opting out of LinkedIn cookies will disable your ability to use Apply with LinkedIn. Cookie Policy Cookie Table Privacy Policy Terms and Conditions	Consent to cookies from provider LinkedIn