Job Summary
JOB : Senior Observability Consultant
Grafana, LGTM, Loki, Tempo, Mimir, Python, Observability, Terraform, Ansible
Key Responsibilities
- Design, develop, and implement Grafana dashboards and panels that provide actionable insights into system performance, reliability, and key performance indicators (KPIs).
- Experience and profound knowledge in the LGTM stack, Loki for logs, Grafana for visualization, Tempo for traces, and Mimir for metrics.
- Collaborate with consumers to understand monitoring needs and translate them into effective Grafana solutions.
- Provide technical guidance and support to teams adopting Grafana for monitoring.
- Conduct training sessions for internal teams on best practices, usage guidelines, and advanced features of Grafana.
- Create comprehensive documentation to facilitate self-service adoption.
- Stay informed about the latest developments in the Grafana ecosystem, including new features, updates, and best practices.
- Evaluate and recommend upgrades or new integrations to enhance our monitoring capabilities.
- Talk to our consumers and identify Observability needs , come up with suggestions on how to effectively use the tools.
- Guide consumers to Onboard and handhold them until they get the actual business value by adopting Observability.
- Define and implement Service Level Objectives (SLOs) for critical services.
Skill Requirements
- Should have Observability as a core skill and not just from a tool perspective but also as a practice.
- Proven experience as a Grafana expert with hands-on design and implementation of monitoring solutions.
- Proficiency in Grafana, Prometheus, and related monitoring technologies.
- Experience in Terraform and Ansible, Proficiency in Otel and other Instrumentation concepts
- Scripting, Automation, and programming skills is a must have.
- Experience with data visualization and dashboard design principles.
- Knowledge in containerization technologies is a must have.
- Experience in other observability toolsets apart from Splunk and Grafana is a plus.
Other Requirements
- Strong hands-on experience with Google Cloud Platform (GCP).
- Experience managing Kubernetes (GKE) in production environments.
- Proficiency in Terraform or similar IaC tools.
- Experience with CI/CD pipelines, preferably GitHub Actions.
- Solid understanding of Docker and container-based deployments.
- Knowledge of cloud security, IAM, certificates, and secrets management.
- Experience with monitoring, alerting, and operational best practices (CloudOps/SRE fundamentals).
- Understanding of microservices architecture and cloud-native design principles.