Job Summary
Monitor and maintain health/availability of Kubernetes control plane and worker nodes.
Perform proactive platform health checks, triage cluster alerts, and support remediation under runbooks.
Support namespace, quota, and RBAC requests (create/update/validate access).
Troubleshoot pod/runtime issues where root cause is platform/config/infra-related (excluding application code changes).
Support persistent storage (PV/PVC) lifecycle troubleshooting and escalate when storage backend involvement is needed.
Support Kubernetes networking (CNI/service connectivity) triage and coordinate with network teams as needed.
Key Responsibilities
Monitor and maintain health/availability of Kubernetes control plane and worker nodes.
Perform proactive platform health checks, triage cluster alerts, and support remediation under runbooks.
Support namespace, quota, and RBAC requests (create/update/validate access).
Troubleshoot pod/runtime issues where root cause is platform/config/infra-related (excluding application code changes).
Support persistent storage (PV/PVC) lifecycle troubleshooting and escalate when storage backend involvement is needed.
Support Kubernetes networking (CNI/service connectivity) triage and coordinate with network teams as needed.
Skill Requirements
Monitor and maintain health/availability of Kubernetes control plane and worker nodes.
Perform proactive platform health checks, triage cluster alerts, and support remediation under runbooks.
Support namespace, quota, and RBAC requests (create/update/validate access).
Troubleshoot pod/runtime issues where root cause is platform/config/infra-related (excluding application code changes).
Support persistent storage (PV/PVC) lifecycle troubleshooting and escalate when storage backend involvement is needed.
Support Kubernetes networking (CNI/service connectivity) triage and coordinate with network teams as needed.
Other Requirements
Tanzu Kubernetes, VMware Aria