Job Summary
The Administrator will be responsible for deploying, maintaining, and operating the following critical components:
* Platform Operations: Managing the deployment, scaling, and maintenance of vanilla Kubernetes clusters. This includes overseeing the full Kubernetes upgrade path, managing the container runtime (containerd), and implementing disaster recovery using Velero.
* Automation: Developing and maintaining all infrastructure-as-code. Expert-level proficiency in Ansible, Shell scripting, and Python is mandatory for configuration management, automated deployments, and managing in-house applications.
* Security & Identity: Implementing and enforcing platform security. This involves managing cluster authentication with Dex, handling secrets via HashiCorp Vault, integrating with our KMS (VIPER), and ensuring governance through policy engines like Gatekeeper, OPA, and Pod Security Policies (PSPs).
* Networking & Load Balancing: Configuring and troubleshooting the networking stack (Calico) and managing bare-metal load balancing solutions (MetalLB).
* Storage Management: Integrating and maintaining enterprise-grade storage arrays using CSI (Container Storage Interface) drivers, specifically working with Dell Isilon and Infinidat storage systems.
* Observability: Maintaining the comprehensive monitoring and logging systems, including the Prometheus/Grafana stack for metrics and alerting, and the ELK (Elasticsearch, Logstash, Kibana) stack for centralized logging.
* Infrastructure Management: Hands-on management of the underlying physical and virtual infrastructure, including Dell physical servers, the Ubuntu OS, TPM module integration, and KVM virtualization.
* Advanced Capabilities: Managing specialized hardware, including the operational provisioning and lifecycle of GPU nodes for high-performance computing workloads.
* DevOps Workflow: Maintaining and optimizing our GitLab CI/CD pipelines and managing source control (SCM).
* Open Source Lifecycle: Owning the complete lifecycle management (patching, configuration, upgrades) of all integrated open-source components.
Essential Technical Skillset
The incumbent must demonstrate proven, hands-on expertise in the following:
* Mandatory Automation: Full mastery of Ansible, Shell Scripting, and Python for infrastructure automation and management.
* Container Fluency: Deep knowledge of Docker commands is required for effective container inspection, debugging, and image troubleshooting.
* Go Language: A basic understanding of Go language development is necessary to facilitate reading, debugging, and reviewing open-source Kubernetes components and utility scripts written in Go.
* Security Stack: Proven operational experience with Vault, Dex, and Gatekeeper/OPA.
* Bare-Metal Focus: Demonstrated experience configuring and managing Kubernetes components like Calico and MetalLB in a bare-metal environment.
Key Responsibilities
The Administrator will be responsible for deploying, maintaining, and operating the following critical components:
* Platform Operations: Managing the deployment, scaling, and maintenance of vanilla Kubernetes clusters. This includes overseeing the full Kubernetes upgrade path, managing the container runtime (containerd), and implementing disaster recovery using Velero.
* Automation: Developing and maintaining all infrastructure-as-code. Expert-level proficiency in Ansible, Shell scripting, and Python is mandatory for configuration management, automated deployments, and managing in-house applications.
* Security & Identity: Implementing and enforcing platform security. This involves managing cluster authentication with Dex, handling secrets via HashiCorp Vault, integrating with our KMS (VIPER), and ensuring governance through policy engines like Gatekeeper, OPA, and Pod Security Policies (PSPs).
* Networking & Load Balancing: Configuring and troubleshooting the networking stack (Calico) and managing bare-metal load balancing solutions (MetalLB).
* Storage Management: Integrating and maintaining enterprise-grade storage arrays using CSI (Container Storage Interface) drivers, specifically working with Dell Isilon and Infinidat storage systems.
* Observability: Maintaining the comprehensive monitoring and logging systems, including the Prometheus/Grafana stack for metrics and alerting, and the ELK (Elasticsearch, Logstash, Kibana) stack for centralized logging.
* Infrastructure Management: Hands-on management of the underlying physical and virtual infrastructure, including Dell physical servers, the Ubuntu OS, TPM module integration, and KVM virtualization.
* Advanced Capabilities: Managing specialized hardware, including the operational provisioning and lifecycle of GPU nodes for high-performance computing workloads.
* DevOps Workflow: Maintaining and optimizing our GitLab CI/CD pipelines and managing source control (SCM).
* Open Source Lifecycle: Owning the complete lifecycle management (patching, configuration, upgrades) of all integrated open-source components.
Essential Technical Skillset
The incumbent must demonstrate proven, hands-on expertise in the following:
* Mandatory Automation: Full mastery of Ansible, Shell Scripting, and Python for infrastructure automation and management.
* Container Fluency: Deep knowledge of Docker commands is required for effective container inspection, debugging, and image troubleshooting.
* Go Language: A basic understanding of Go language development is necessary to facilitate reading, debugging, and reviewing open-source Kubernetes components and utility scripts written in Go.
* Security Stack: Proven operational experience with Vault, Dex, and Gatekeeper/OPA.
* Bare-Metal Focus: Demonstrated experience configuring and managing Kubernetes components like Calico and MetalLB in a bare-metal environment.
2. Develop solution presentations and technical documentation utilizing PowerPoint and Word, ensuring clarity and alignment with client needs.
3. Support the configuration and demonstration of product features using relevant platforms, addressing customer queries and showcasing solution fit.
4. Collaborate within the presales team to gather data, prepare responses for RFPs/RFIs, and assist in creating competitive proposal documents.
5. Maintain and update presales knowledge repositories using SharePoint or similar systems to ensure up-to-date information for solutioning.