Job Summary
Job Description: Major Incident Manager (MIM)
Role Summary
The Major Incident Manager is responsible for managing high-impact IT incidents from detection through restoration and closure. The role coordinates technical teams, business stakeholders, vendors, and support groups to ensure timely service recovery, accurate communication, proper documentation, and adherence to ITSM and ITIL processes.
Key Responsibilities
- Validate and confirm the priority of incidents proposed as Major Incidents, including P1, P2, and high-visibility P3 scenarios.
- Lead major incident bridge calls, group chats, and war rooms to drive coordinated troubleshooting and rapid restoration.
- Engage the right resolver groups, technical SMEs, vendors, service owners, and leadership stakeholders within agreed timelines.
- Ensure regular and accurate updates are captured in the system of record, including work notes, timelines, actions, decisions, and recovery progress.
- Publish timely stakeholder communications, including start, progress, impact, workaround, restoration, and closure updates.
- Coordinate with Incident, Problem, Change, Service Desk, Monitoring, and Resolver teams to ensure end-to-end incident lifecycle governance.
- Trigger Emergency Change and Problem Management processes where required and support Post-Incident Review and debrief sessions.
- Maintain Major Incident reports, timelines, lessons learned, and action trackers for operational and leadership review.
- Monitor SLA adherence, escalation timelines, resolver participation, and service restoration progress.
- Identify recurring issues, process gaps, and improvement opportunities to reduce repeat incidents and improve service resilience.
Required Skills and Competencies
- Strong understanding of ITSM and ITIL Incident, Problem, and Change Management processes.
- Hands-on experience managing P1/P2 incidents in a 24x7 enterprise environment.
- Excellent communication, facilitation, and stakeholder management skills.
- Ability to work under pressure and coordinate multiple teams during critical incidents.
- Good understanding of enterprise infrastructure, applications, network, cloud, database, and monitoring environments.
- Strong documentation, reporting, analytical, and problem-solving skills.
- Experience with ServiceNow or similar ITSM tools, including Major Incident Workbench or equivalent functionality.
- Ability to manage bridge discipline, action ownership, escalation paths, and business impact communication.
Qualifications and Experience
- Bachelor’s degree in Information Technology, Computer Science, Engineering, or a related discipline.
- Minimum 3 to 6 years of experience in Incident Management, Major Incident Management, IT Operations, or IT Service Management.
- Experience working in multi-technology and multi-supplier environments is preferred.
- ITIL Foundation certification is preferred; ITIL Intermediate, ITIL 4, SIAM, or related certifications are an added advantage.
- Experience supporting global customers, critical business applications, and 24x7 operational models is desirable.
Key Responsibilities
2. Develop infrastructure optimization plans by leveraging VMware and Hyper-V, ensuring efficient resource utilization and scalability.
3. Support implementation of cloud migration strategies using AWS and Azure, facilitating seamless transition and integration of client systems.
4. Analyze network security posture with Cisco and Fortinet solutions, providing actionable recommendations to enhance protection and compliance.
5. Prepare and present technical documentation and reports using Microsoft Office Suite, ensuring stakeholders are informed of project progress and outcomes.
6. Collaborate with internal teams to troubleshoot infrastructure issues using ServiceNow and ITIL-based processes, ensuring timely resolution and minimal disruption.