Job Summary
As a DevOps Engineer, you will be instrumental in building, automating, and maintaining the infrastructure and pipelines that support our API program. You will work closely with API developers, platform engineers, and SRE teams to enable rapid, reliable and secure delivery of products. Your expertise in Devops practices, CI/CD, and API management will ensure our products are robust, scalable and efficient.
Key Responsibilities
- Design, build, and maintain CI/CD pipelines for automated build, test, and deployment using GitHub Actions / Jenkins / similar tooling.
- Provision and manage infrastructure using Infrastructure as Code (Terraform), ensuring secure and repeatable deployments.
- Deploy and operate containerized workloads on cloud platform, rollout strategies, and environment promotion.
- Build and maintain platform observability: metrics, logs, alerts, dashboards; drive improvements in availability and MTTR.
- Collaborate with development and security teams to implement DevSecOps practices (secrets management, least privilege IAM, vulnerability scanning, policy compliance).
- Troubleshoot production incidents, drive root cause analysis, and implement preventive automation and resiliency improvements.
Skill Requirements
- Google Cloud Platform (GCP)
- Strong hands-on experience with core GCP services such as:
- Compute / Storage / Networking (e.g., VPC, firewall rules, private access patterns)
- GKE (cluster operations, upgrades, workload troubleshooting)
- CI/CD & Automation
- CI/CD: GitHub Actions (preferred), Jenkins/GitLab CI (acceptable), Kokoro (Preferred)
- Strong scripting in Bash and/or Python
- Git best practices: branching strategies, PR reviews, release tagging
- Infrastructure as Code & Configuration
- Terraform for infrastructure provisioning and modular IaC patterns
- Containers & Orchestration
- Docker, Kubernetes fundamentals (deployments, services, ingress)
- Troubleshooting pods, nodes, networking, resource limits/requests
- Databases / Distributed Systems (Added)
- PostgreSQL: operations, tuning, backup/restore, migration support
- Cassandra: cluster operations, performance tuning, reliability practices
- ZooKeeper: cluster management, monitoring, stability troubleshooting
- Observability
Monitoring/logging tools such as Prometheus/Grafana