Job Summary
Responsible for 24x7 infrastructure monitoring and batch job operations using Control‑M, ensuring job execution, alert management, SLA adherence, and coordination with application/infrastructure teams for timely resolution.
Key Responsibilities
1. Infrastructure Monitoring
- Monitor alerts across domains (Network, Compute, Cloud, IAM, etc.) and take timely action
- Validate alerts and raise incidents where required
- Ensure timely acknowledgement, updates, and warm handovers within SLA
- Perform continuous monitoring of Glass Table / dashboards and highlight discrepancies/issues
2. Control‑M Batch Operations
- Execute and manage batch jobs across DEV, UAT, and PROD environments
- Order jobs/folders as per requests (daily, monthly, quarterly schedules)
- Monitor job status (success, failure, held jobs) and take corrective actions
- Re-trigger failed jobs and ensure successful completion
- Handle average held jobs and ensure closure within defined SLA windows
3. Incident & Ticket Management
- Raise incidents for failures, discrepancies, or monitoring gaps
- Perform follow-ups for ticket updates, work notes, and stakeholder communication
- Ensure proper ticket hygiene and SLA compliance
- Categorize incidents accurately and verify business impact
4. Change & Release Support
- Support change execution (job promotions, XML deployments, scheduling changes)
- Validate batch activities post‑deployment
- Participate in ECR approvals and change governance activities
5. Coordination & Escalation
- Coordinate with application, infrastructure, and vendor teams for issue resolution
- Initiate bridge calls for critical failures or widespread impact
- Escalate issues as per severity and business impact
6. Daily Operations & Reporting
- Track job volumes, alert volumes, and incident metrics
- Provide daily/weekly updates on:
- Job execution status
- Alert handling
- Incident trends
- Highlight risks, recurring issues, and improvement areas
7. Continuous Improvement
- Identify false alerts / threshold issues and recommend optimization
- Contribute to KB articles and runbook updates
- Support automation and process improvements in batch operations
Skill Requirements
- Control‑M (Batch scheduling, monitoring, job ordering)
- ITSM tools (ServiceNow or equivalent)
- Monitoring tools (Glass Table / dashboards)
- Incident & Change Management