Job Summary
GenAI with Copilot 1) SM AI / AIOps Capabilities (Core Focus) Provide predictive insights & anomaly detection across infrastructure and platform signals (metrics/logs/traces/events/tickets), proactively identifying emerging risks and degradation trends. Design and implement AIOps patterns such as: Signal correlation (event + topology + service mapping) Noise reduction (duplication, suppression, alert rationalization) Incident clustering and “probable cause” identification Change risk signals (change-related anomaly detection, blast radius indicators) Identify and prioritize automation opportunities: Auto-triage enrichment (context injection into incidents) Auto-routing suggestions aligned to service ownership Auto-remediation for repeatable patterns with guardrails and human-in-the-loop approvals Build and maintain insight pipelines that link operational telemetry with ITSM data to produce actionable outcomes for: Major Incident (MI) early warning Incident prevention Faster restoration through decision support 2) Analytics & Governance Dashboards (CC / NOC / SM) Build and operationalize dashboards and governance views for CC/NOC/SM leaders, including: Service health and early warnings Alert volume trends, noise ratio, top talkers MTTA/MTTR drivers, recurring patterns MI leading indicators and “near miss” signals Automation impact metrics (time saved, repeat reduction) Create operational KPI packs with clear storylines for governance forums (weekly/monthly), enabling data-led decisions and prioritization. 3) Problem Intelligence & Shift‑Left Enablement Analyze patterns across incidents, alerts, and changes to: Identify Problem themes, recurring failure modes, and top drivers Generate candidate Known Errors, workaround suggestions, and knowledge articles Recommend shift-left opportunities (L1/L2 enablement) by converting patterns into reusable diagnostics and guided actions Partner with Problem, Change, and Service Owners to ensure insights translate into: Problem records with evidence Engineering backlog items Change governance improvements Reduced repeat incidents 4) GenAI / Agentic Automation for Operations (Copilots & Assistants) Develop GenAI copilots/agents to support operational workflows: Incident summarization (telemetry + ticket history + change context) Suggested next-best actions and diagnostic steps Automated enrichment and knowledge retrieval (RAG) Runbook guidance and workflow orchestration for repeated tasks Build Retrieval-Augmented Generation (RAG) systems using internal knowledge sources (K
Key Responsibilities
Skill Requirements
Technology Skills
Required hands-on experience includes:
- Python — ability to build scalable, production-grade services and analytics pipelines.
- AIOps & Observability Analytics
- Working with telemetry (metrics/logs/traces/events)
- Practical exposure to orchestration frameworks (e.g., LangChain/LangGraph/CrewAI or similar)
- Anomaly detection approaches (statistical + ML-based)
- Correlation techniques (time-series + topology/context)
- Alert deduplication / suppression / classification
- Data & Analytics Engineering
- Data modelling for operational datasets (ITSM + telemetry)
- SQL and/or equivalent querying capability
- Dashboard development (BI and/or observability dashboards)
- Understanding reasoning patterns and safe operationalization - Tool-use, verification layers, guardrails, human approvals
- GenAI / RAG for Operational Knowledge
- Building RAG pipelines, embeddings, vector search concepts
- Evaluation approaches (grounding, accuracy, hallucination reduction)
- RAG systems using internal knowledge sources (runbooks, postmortems, KEDB)
- Integration & Automation
- API integration and enterprise workflow integration patterns
- Automation frameworks / orchestration basics (human-in-loop controls)
- Designing assistants/agents to support incident triage, diagnostics, summarization, and enrichment
Leadership & Behavioural
- Partner across Infrastructure, CC/NOC, Service Management, Product/Engineering, Security to deliver operational outcomes.
- Strong stakeholder engagement — able to communicate complex insights clearly to senior stakeholders.
- Pragmatic execution under ambiguity; proactive, outcome-driven delivery.
- Proficient in verbal and written English, with the ability to communicate comfortably with senior management and stakeholders.
Good To Have
· Experience with AIOps platforms and/or enterprise observability tooling (any major platform acceptable).
· Familiarity with ITSM data structures (Incidents/Problems/Changes, categorisation, routing, SLAs).
· React + JavaScript (for lightweight UIs for operational assistants).
· Data Structures & Algorithms (intermediate foundations).
Qualifications
- Graduation or Post Graduation.
- Experience building analytics/AI solutions for operations (NOC/CC/ITSM) preferred..
- 3-5 years of hands-on experience across AI/nalytics stacks (incl. Gen AI exposure)
- Overall 8-10 years of experience.