Senior SRE Technical Specialist Job Details

Senior SRE Technical Specialist

United Kingdom

Job Description

Senior SRE Technical Specialist

Bedfordshire, England

Job Summary

The Real Time Payments International team is looking for a Site Reliability Engineer (SRE) to drive application deployment readiness, manage day-to-day operational stability and support the reliability of critical payment platforms by implementing automation, leverage best practices and work with a high‑impact team responsible for driving production readiness, reliability, and DevOps automation across Mastercard platforms.

This role plays a key part in incident management, change readiness, and platform operations, while contributing to continuous improvement initiatives.

Key Responsibilities

Platform Operations & Stability

Support end-to-end availability, monitoring, and performance of critical payment platforms.
Execute operational processes to ensure platform health and stability.
Participate in capacity checks, readiness validations, and environment monitoring.

Incident Management & Execution

Actively manage and coordinate incident triage and resolution.
Serve as incident commander driving medium to high-severity incidents.
Ensure timely updates, accurate impact assessment, and appropriate escalation.
Contribute to root cause analysis with clear identification of actions and ownership.

Change & Release Support

Participate in highlighting gaps and defining test cases required for a change in lower environments and validate lower environment test completeness.
Ensure adherence to change governance processes (test case reviews, checklists, approvals, rollback readiness).
Engage in creating change plans and support execution of production changes, deployments, and validations.

Technical Troubleshooting

Perform hands-on troubleshooting across:
- Application behaviour and dependencies.
- Infrastructure components (compute, network, storage).
- Database and performance issues.
Collaborate with engineering, infrastructure and other technical teams to isolate and resolve issues efficiently.

Monitoring & Observability

Improve system health monitoring using observability tools and alerts.
Identify gaps in alerting and contribute to improving quality of alerting and dashboards.
Ensure proactive detection of anomalies using observability tools.

Automation & Process Improvement

Contribute to automation initiatives to reduce toil and errors.
Identify repetitive operational tasks and drive improvements.
Support implementation of DevOps best practices.
Leverage AI-driven tools to improve monitoring, incident detection, and operational efficiency, enabling faster troubleshooting and reduced manual effort in day-to-day operations.

Stakeholder Coordination

Work closely with engineering, program teams, and external partners during incidents and changes.
Provide structured updates to stakeholders with clarity and consistency.
Ensure alignment during critical activities.

Risk Identification

Highlight operational and platform risks including test coverage gaps, infrastructure constraints, dependency risks.
Escalate issues proactively and support mitigation tracking.

Team Contribution & Mentorship

Support onboarding and guidance of junior team members.
Contribute to runbooks, documentation, and knowledge sharing.
Drive consistency in execution and adherence to operational standards.

Success in This Role Looks Like:

Deep Operational Ownership (“Built to Run” Mindset)

A successful Lead SRE Engineer is fully accountable for the operational health of their program, not just responsive to incidents.

Monitoring, alerting, and dashboards that reflect real customer impact.
Emergency response and incident leadership, including clear communications and post-incident follow‑ups.
Capacity planning and readiness aligned with product and business growth.
Change management discipline, ensuring safe, compliant releases.

2.Strong Technical & System-Level Understanding

A Lead SRE Engineer is expected to operate at system dependency level, not just ticket or tool level.

Have a strong understanding of application business logic and workflows.
Have a clear grasp of upstream/downstream dependencies.
Expertise in observability (alerts, dashboards, synthetic monitoring).
Ability to drive automation to reduce manual toil and recurring issues.
End to End ownership of tasks and activities.

3. Incident Leadership & Decision-Making Under Pressure

Beyond technical skill, Leads are distinguished by how they lead during high‑severity situations.

Takes command of major incidents, not waiting to be asked.
Maintains calm, structured communication with engineering, product, and leadership.
Balances speed vs risk in decision-making.
Ensures clear ownership of actions, timelines, and follow‑ups.
Drives root cause analysis and systemic fixes, not just recovery.

4. Proactive Risk & Reliability Engineering

A successful Lead SRE prevents incidents more than fight them.

Identifies systemic risks before they become outages.
Pushes for design, monitoring, or process improvements.
Challenges “tribal knowledge” by insisting on documentation and runbooks.
Drives improvements aligned with operational maturity models.

5. Leadership Without Formal Authority

Lead SRE Engineers often lead without being people managers, which requires strong influence skills.

Mentors and coaches senior and mid-level SRE’s.
Sets the technical and behavioural bar for the team.
Gives clear, constructive feedback.
Acts as a role model for ownership, urgency, and professionalism.
Builds trust with Engineering, Product, and Platform teams.
Flexibile in terms of working hours where needed.

6. Excellent Cross‑Functional Communication

SRE Leads sit at the intersection of technology, operations, and business.

Translating technical issues into business impact to communicate clearly with senior stakeholders during incidents.
Setting expectations early and transparently with junior team members.
Represents SRE confidently in planning, reviews, and retrospectives,
Ensuring post‑incident learnings are shared and acted upon.

7. Continuous Learning & Product Mastery

A Lead SRE Engineer is expected to continuously deepen product and platform knowledge.

Actively closing knowledge gaps in their program by driving learning within the team.
Staying current with platform changes, dependencies, and risks.
Ensuring knowledge is documented and reusable, not person‑dependent.

Skill Requirements

Experience in production support, SRE, or BizOps roles.
Exposure to managing incidents and supporting distributed systems.
Experience in payments ecosystem will be preferred.
Knowledge of monitoring and alerting tools like Splunk, Dynatrace, Blaze meter.

Knowledge of automation and DevOps practices. Demonstrated ability to design end‑to‑end CI/CD flows that deliver high‑quality software to production with minimal manual intervention — including centralized configuration and unified pipelines across environments.

Experience working in cross-functional and high-pressure environments.
Ability to organize, multi-task and prioritise work based on current business needs.
Possesses strong verbal and written communication skills.
Strong relationship skills, collaborative skills and stakeholder management skills.
Experience in one or more scripting language is preferred.
Interest in designing, analysing and troubleshooting large-scale distributed systems.
Ability to work with little or no supervision.

Tech skills --

Operating System – Unix [Commands and scripting]
Database - Oracle or equivalent
Devops -> Chef, Jenkins, Github or any equivalent tools
Supporting a java-based application in virtualized env and basic knowledge of VMs, hypervisors etc
Monitoring - Splunk, Dynatrace or equivalent
Scripting -- Shell or python will be preferred but anything equivalent will do
Experience of supporting production systems (preferably in finance) is mandatory

Other Requirements

1.Relevant certifications in cloud platforms (AWS, Azure, GCP) or DevOps (e.g., AWS Certified DevOps Engineer, Certified Kubernetes Administrator) would be advantageous.

Information at a Glance

Why HCLTech?

At HCLTech, you'll supercharge your potential. You'll find your career. And you'll find your spark. All at a place that knows that helping its customers stay on top starts by putting its people first.

HCLTech is a global technology company, home to more than 226,300 people across 60 countries, delivering industry-leading capabilities centered around digital, engineering, cloud and AI, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Financial Services, Manufacturing, Life Sciences and Healthcare, Technology and Services, Telecom and Media, Retail and CPG, and Public Services. Consolidated revenues as of 12 months ending December 2025 totaled $14.5 billion.

23 Benefits At HCLTech, we believe in empowering our employees with comprehensive benefits that support their professional growth and enhance their well-being. When you sign up for a career with us, you gain access to: https://rmkcdn.successfactors.com/147eb21f/a701dca9-f32d-4fc9-9447-6.svg Industry-benchmarked compensation https://rmkcdn.successfactors.com/147eb21f/b0c54381-ddcc-4a33-9b35-9.svg Best-in-class healthcare benefits https://rmkcdn.successfactors.com/147eb21f/b73027be-7aae-4d36-a090-4.svg Personal time off https://rmkcdn.successfactors.com/147eb21f/d5b4fdfd-2e99-4e26-9878-9.svg Maternity and paternity benefits https://rmkcdn.successfactors.com/147eb21f/3d42b0fc-4652-435a-9ece-c.svg Access to skills / higher education programs/resources https://rmkcdn.successfactors.com/147eb21f/aeddeaf2-9e25-4584-ad11-d.svg Discounts on products and services via Benefit Box https://rmkcdn.successfactors.com/147eb21f/a9609a3b-2700-4b3c-9d90-a.svg Participate in CSR programs and live life with a purpose https://rmkcdn.successfactors.com/147eb21f/c6e33851-710f-4634-bd69-f.svg Opportunities to grow and advance your career Note: The benefits listed above vary depending on the nature of your employment and the country where you work. Some benefits may be available in some countries but not in all.

Provider	Description	Enabled
Vimeo	Vimeo is a video hosting, sharing, and services platform focused on the delivery of video. Opting out of Vimeo cookies will disable your ability to watch or interact with Vimeo videos. Cookie Policy Privacy Policy Terms and Conditions	Consent to cookies from provider Vimeo
YouTube	YouTube is a video-sharing service where users can create their own profile, upload videos, watch, like, and comment on videos. Opting out of YouTube cookies will disable your ability to watch or interact with YouTube videos. Cookie Policy Privacy Policy Terms and Conditions	Consent to cookies from provider YouTube

Provider	Description	Enabled
Google Analytics	Google Analytics is a web analytics service offered by Google that tracks and reports website traffic. Cookie Information Privacy Policy Terms and Conditions	Consent to cookies from provider GoogleAnalytics
Google Tag Manager	Google Tag Manager is a tag management system for conversion tracking, site analytics, remarketing, and more. Privacy Policy Terms and Conditions	Consent to cookies from provider GoogleTagManager
LinkedIn	LinkedIn is an employment-oriented social networking service. We use the Apply with LinkedIn feature to allow you to apply for jobs using your LinkedIn profile. Opting out of LinkedIn cookies will disable your ability to use Apply with LinkedIn. Cookie Policy Cookie Table Privacy Policy Terms and Conditions	Consent to cookies from provider LinkedIn