Job Summary
Senior Administrator - ELK - Elastic Search, Windows PowerShell
Developer role for supporting GOON and OctoBus platforms under unified operational model
: Primary Skill: Grafana, Prometheus, Programming languages (Python or Java)\\\\r\\\\nSecondary Skill: Kafka\\\\r\\\\nGood to have: Kubernetes, Alert Manager
Key Responsibilities
Design, develop, and maintain observability platform components and integrations across Prometheus, Thanos, Grafana, OpenTelemetry, and streaming telemetry systems. • Contribute to architecture and technical design of scalable monitoring solutions running on Kubernetes, Docker, and cloud-native environments. • Implement standardized instrumentation using OpenTelemetry SDKs, collectors, exporters, and agents across services and infrastructure. • Build and optimize telemetry pipelines for metrics, logs, and traces using Prometheus, OTEL Collector, Kafka/streaming pipelines, and time-series backends. • Develop advanced PromQL queries, recording rules, and Alertmanager logic for complex monitoring scenarios. • Create reusable dashboards and visualization templates using Grafana (and Perses if applicable). • Automate deployments and configuration using Git, GitHub/GitLab, Jenkins, ArgoCD, Helm, and Infrastructure-as-Code practices. • Troubleshoot and optimize performance across collectors, exporters, storage backends, and query layers. • Support performance testing, load validation, and reliability analysis of observability components. • Collaborate with engineering and SRE teams to onboard services and improve telemetry coverage across platforms. • Document implementations, standards, and operational procedures.
Skill Requirements
Strong programming experience in Go, Python, or Java with focus on backend or platform engineering. • Hands-on expertise with Prometheus ecosystem (Prometheus, Alertmanager, exporters, Pushgateway) and PromQL. • Experience implementing OpenTelemetry instrumentation, collectors, processors, and pipelines. • Strong knowledge of Kubernetes, containers, Helm, and microservices architecture. • Experience with CI/CD tools such as Jenkins, GitHub Actions, GitLab CI, or ArgoCD. • Understanding of distributed systems, performance tuning, debugging, and profiling techniques. • Familiarity with streaming and messaging systems (e.g., Kafka or equivalent) and time-series databases. • Experience building or integrating REST/gRPC APIs. • Proficiency in Git workflows, scripting (Bash/Python), and automation frameworks. • Understanding of SNMP, exporters, and infrastructure/device telemetry collection. • Awareness of security, RBAC, secrets management, and compliance requirements in platform environments.
Other Requirements
Responsibilities • Design, develop, and maintain observability platform components and integrations across Prometheus, Thanos, Grafana, OpenTelemetry, and streaming telemetry systems. • Contribute to architecture and technical design of scalable monitoring solutions running on Kubernetes, Docker, and cloud-native environments. • Implement standardized instrumentation using OpenTelemetry SDKs, collectors, exporters, and agents across services and infrastructure. • Build and optimize telemetry pipelines for metrics, logs, and traces using Prometheus, OTEL Collector, Kafka/streaming pipelines, and time-series backends. • Develop advanced PromQL queries, recording rules, and Alertmanager logic for complex monitoring scenarios. • Create reusable dashboards and visualization templates using Grafana (and Perses if applicable). • Automate deployments and configuration using Git, GitHub/GitLab, Jenkins, ArgoCD, Helm, and Infrastructure-as-Code practices. • Troubleshoot and optimize performance across collectors, exporters, storage backends, and query layers. • Support performance testing, load validation, and reliability analysis of observability components. • Collaborate with engineering and SRE teams to onboard services and improve telemetry coverage across platforms. • Document implementations, standards, and operational procedures. Required Skills and Expertise • Strong programming experience in Go, Python, or Java with focus on backend or platform engineering. • Hands-on expertise with Prometheus ecosystem (Prometheus, Alertmanager, exporters, Pushgateway) and PromQL. • Experience implementing OpenTelemetry instrumentation, collectors, processors, and pipelines. • Strong knowledge of Kubernetes, containers, Helm, and microservices architecture. • Experience with CI/CD tools such as Jenkins, GitHub Actions, GitLab CI, or ArgoCD. • Understanding of distributed systems, performance tuning, debugging, and profiling techniques. • Familiarity with streaming and messaging systems (e.g., Kafka or equivalent) and time-series databases. • Experience building or integrating REST/gRPC APIs. • Proficiency in Git workflows, scripting (Bash/Python), and automation frameworks. • Understanding of SNMP, exporters, and infrastructure/device telemetry collection. • Awareness of security, RBAC, secrets management, and compliance requirements in platform environments.