Job Summary
Monitor cloud infrastructure and SaaS applications (e.g., S4HANA, BTP, SuccessFactors, CX, ERP Non ABAP..) using tools like Dynatrace, Prometheus, CloudWatch, Elastic or Azure Monitor. • Configure and optimize alerting frameworks (e.g., ELK Stack, Open Telemetry) to minimize false positives. • Implement AI-driven predictive analytics to prevent incidents (target: 40% reduction in preventable incidents). • Analyze performance metrics and correlate service impacts using tools like vRealize Ops or ThousandEyes. • Escalate critical alerts to Green Cap team with detailed telemetry data.
Key Responsibilities
Bachelor’s degree in IT or related field. • 3+ years in IT operations with expertise in cloud monitoring tools (e.g., Dynatrace, NewRelic, Prometheus, Kibana, ELK). • Solid Understanding of cloud computing concepts (IaaS, PaaS, SaaS) and platforms (AWS, Azure, GCP, SAP BTP). • Proficiency in scripting (Python, Bash) and cloud networking concepts. • Knowledge of SAP ecosystems (S4HANA, HANA DB, NetWeaver) and Kubernetes. • Good understanding with Open Telemetry, performance monitoring, and service correlation. • Ability to work in a 24/7 shift environment. • Preferred certifications: AWS Certified Cloud Practitioner, Azure / GCP Fundamentals, ITIL Foundation, CompTIA Cloud Essentials+.
Skill Requirements
Role Overview: The Observability Ops Engineer (Red Cap) ensures real-time monitoring of infrastructure and application alerts to drive early incident detection and prevention, aligning with Cloud Operations Excellence. This role leverages advanced observability tools and AI-driven analytics to achieve time to detect (TTD < 3 mins) and reduce signal-to-noise ratio. Key Responsibilities: • Monitor cloud infrastructure and SaaS applications (e.g., S4HANA, BTP, SuccessFactors, CX, ERP Non ABAP..) using tools like Dynatrace, Prometheus, CloudWatch, Elastic or Azure Monitor. • Configure and optimize alerting frameworks (e.g., ELK Stack, Open Telemetry) to minimize false positives. • Implement AI-driven predictive analytics to prevent incidents (target: 40% reduction in preventable incidents). • Analyze performance metrics and correlate service impacts using tools like vRealize Ops or ThousandEyes. • Escalate critical alerts to Green Cap team with detailed telemetry data. Required Skills and Qualifications: • Bachelor’s degree in IT or related field. • 3+ years in IT operations with expertise in cloud monitoring tools (e.g., Dynatrace, NewRelic, Prometheus, Kibana, ELK). • Solid Understanding of cloud computing concepts (IaaS, PaaS, SaaS) and platforms (AWS, Azure, GCP, SAP BTP). • Proficiency in scripting (Python, Bash) and cloud networking concepts. • Knowledge of SAP ecosystems (S4HANA, HANA DB, NetWeaver) and Kubernetes. • Good understanding with Open Telemetry, performance monitoring, and service correlation. • Ability to work in a 24/7 shift environment. • Preferred certifications: AWS Certified Cloud Practitioner, Azure / GCP Fundamentals, ITIL Foundation, CompTIA Cloud Essentials+. Measurable Outcomes: • Achieve TTD < 3 minutes for 95% of incidents. • Reduce alert noise by 30% within 6 months. • Contribute to 40% incident prevention via AI-driven observability by Q2 2026.