Job Summary
Senior Network Monitoring Architect (AKIPS Specialist) Job Overview The AKIPS platform is expected to provide end-to-end network visibility with near real-time operational insight across routers, switches, firewalls, WAN links, and critical infrastructure devices. The solution should support 60-second polling for key performance indicators, 15-second reachability checks where needed, and scalable monitoring of large interface volumes without creating unnecessary network overhead. It should ingest and correlate SNMP, ping, syslog, SNMP traps, and flow telemetry such as NetFlow to enable faster fault detection and isolation. We also expect strong dashboarding and reporting capabilities for operations teams, service owners, and leadership, including device health, interface utilization, packet drops, errors, latency trends, flap history, and top talkers. The platform should support custom alert thresholds, event filtering, suppression of noise, historical trend retention, and the ability to segment views by business service, site, device group, or region. Additionally, the tool should enable integration through APIs or scripts for automation, ticketing, and downstream analytics, while supporting secure monitoring practices such as SNMPv3, backup and restore, and upgrade lifecycle management. ________________________________________ Key Responsibilities • Architecture & Strategy: Define the target state architecture for network visibility across hybrid cloud, data center, and campus edge environments using AKIPS. • Scalability Management: Engineer the server capacity, thick-provisioned VM infrastructure, or cloud-based instances (such as AKIPS on AWS) to support up to 60-second polling intervals across 1M+ interfaces. • Automation & Site Scripting: Develop and maintain automated infrastructure discovery and AKIPS Site Scripting features using Perl/API Integrations to extend default MIB/CLI parsing functionalities. • Dashboard & Reporting Design: Architect role-based operational dashboards and granular event threshold frameworks (Ping, SNMPv3, Syslog, and Traps). • Vendor & MIB Management: Compile and maintain custom multi-vendor MIB files to track complex telemetry (e.g., optical power trends, hardware vitals, and switch port mappings). • Lifecycle & Security: Manage software upgrades, backup/restore routines, OS hardening (FreeBSD underlying platform), and secure credential handling via SNMPv3 (SHA/AES). ________________________________________ Required Technical Skills • Core NMS Expertise: Advanced architecture level experience with AKIPS Network Monitoring Software. Additional experience with complementary platforms (e.g., Zabbix, Nagios, or Kentik) is a plus. • Network Protocols: Expert knowledge of SNMP (v2c/v3), Syslog telemetry, NetFlow/sFlow, CDP/LLDP, and deep knowledge of MIB structure mapping. • Enterprise Infrastructure: Strong understanding of multi-vendor routing and switching hardware (Cisco, Arista, Juniper) across SD-WAN and Data Center layers. • Scripting & APIs: Proficiency in Perl (for native AKIPS site scripting) or Python/Bash for parsing AKIPS API endpoints. • Systems Administration: Solid experience managing the underlying server layer (FreeBSD/Linux architectures), VM configurations (VMware thick-provisioned storage optimizations), or public cloud hosting deployments. ________________________________________ Qualifications & Experience • Experience: 8+ years in Network Engineering or Architecture roles, with at least 3+ years specifically focused on engineering enterprise Network Management Systems (NMS). • Education: Bachelor’s degree in Computer Science, Network Engineering, Information Technology, or equivalent practical experience. • Certifications (Preferred): CCIE/CCNP (Enterprise/Data Center), systems-level certifications. ________________________________________ Key Perf
Key Responsibilities
2. Optimize batch job monitoring processes with tools like Control-M and Autosys, implementing advanced scheduling and alerting strategies to minimize downtime and SLA breaches.
3. Drive continuous improvement initiatives in monitoring workflows and escalation procedures, leveraging ITIL frameworks and automation platforms to enhance operational efficiency.
4. Guide and mentor the monitoring team in best practices for event correlation, incident triage, and root cause analysis using platforms such as ServiceNow and BMC Remedy.
5. Collaborate with stakeholders to align monitoring solutions with evolving client requirements, delivering tailored dashboards and reporting via tools like Grafana and Kibana.
6. Innovate monitoring processes by evaluating and integrating emerging technologies, ensuring the command center remains at the forefront of operational excellence.
7. Ensure compliance with security and governance standards in all monitoring and event management activities, utilizing SIEM solutions where appropriate.
Skill Requirements
2. Advanced Proficiency In Automation Scripting (Python, Powershell, Shell) For Monitoring Optimization And Workflow Automation.
3. Excellent Ability To Design, Implement, And Optimize Dashboards And Reporting Using Grafana, Kibana, Or Similar Tools.
4. Strong Expertise In Root Cause Analysis, Event Correlation, And Escalation Management Using Servicenow Or Bmc Remedy.
5. Excellent Leadership And Mentoring Skills For Guiding Technical Teams In HighPressure Operational Settings.
6. Advanced Proficiency In Aligning Monitoring Solutions With Business Objectives And Client Slas.
Other Requirements
2. Certified in IBM Netcool, Splunk, or equivalent monitoring platforms (optional but valuable)
3. ControlM or Autosys certification (optional but valuable