Job Summary
Develop observability, monitoring, and KPI dashboards for the agentic AI ecosystem to ensure operational excellence and measurable business outcomes.
The Subject Matter Expert (Support & Ops) plays a critical role in ensuring timely resolution of escalations and incidents while maintaining compliance with quality norms and service level agreements (SLAs). This position is pivotal in driving customer satisfaction through effective communication, mentorship, and operational excellence in data management practices.
Key Responsibilities
Build agent health monitoring and alerting subsystems. Design KPI dashboards tracking incident/request/change volumes, success rate, MTTD, and MTTR. Implement logging, telemetry, and observability hooks. Integrate dashboards with leadership reporting tools. Ensure SLA-aligned reporting and exception handling.
1. Ensure Timely Resolution And Compliance Of Escalated Tickets And Incidents By Applying Etl Methodologies And Data Lake Principles, Adhering To Agreed Slas.
2. Mentor Team Members And Administrators, Preparing Standard Operating Procedures (Sops) And Maintaining Comprehensive Documentation To Enhance Team Performance And Knowledge Sharing.
3. Validate Change Order Implementation Plans And Ensure Human Error Compliance While Actively Participating In Capacity Planning Processes Using Aws Core Services.
4. Foster Positive Customer Relationships By Engaging In Customer Meetings To Understand And Address Challenges, Ensuring A High Level Of Customer Satisfaction.
5. Validate Root Cause Analyses And Trend Analyses, Producing Reports That Facilitate Performance Improvements And Effectively Communicating Findings To Key Business Stakeholders.
Skill Requirements
Strong in Python and dashboarding (Looker, Power BI, Grafana). Experience with monitoring stacks (Prometheus, Cloud Monitoring). Understanding of ITSM KPIs (MTTD, MTTR, success rate). Exposure to agentic/AI workflow monitoring. Strong analytical and data visualization mindset.
1. Proficient In Etl Processes And Data Lake Architectures.
2. Strong Understanding Of Aws Core Services, Rds, And Analytics Tools.
3. Excellent Analytical And Problem-Solving Skills.
4. Effective Communication And Stakeholder Management Capabilities.
Other Requirements
Develop observability, monitoring, and KPI dashboards for the agentic AI ecosystem to ensure operational excellence and measurable business outcomes.
1. Aws Certified Data Analytics Specialty (Optional But Valuable)
2. Itil Foundation Certification (Optional But Valuable