Job Summary
AI Architect The Role The AI Architect is responsible for defining, designing, and governing end‑to‑end artificial intelligence system architectures that align with business objectives, data strategies, and enterprise technology standards. This role provides technical leadership across AI solution lifecycles, from ideation to production, ensuring scalability, security, interoperability, and regulatory compliance. Competency Focus: AI Infrastructure and architecture design, cloud-native architecture, model governance, large‑scale distributed systems Keywords: HPC Architect, HPC Architecture and System Design Responsibilities: Architect, deploy, and operate large-scale accelerator clusters, including NVIDIA DGX platforms, discrete NVIDIA and AMD GPUs, and TPU-based systems, ensuring high availability, scalability, and performance. Design and architect high‑bandwidth, low‑latency interconnect architectures, leveraging technologies such as InfiniBand, NVLink, and RoCE to support distributed AI training and inference workloads. Architect and design end‑to‑end AI training and inference platforms across on‑premises and public cloud environments (Azure, AWS, GCP), incorporating elastic GPU resource orchestration and automated scaling mechanisms. Architect and engineer high‑performance, large‑scale data delivery and storage solutions, including petabyte‑scale object storage and distributed file systems (e.g., VAST Data, WekaIO, DDN) optimized for AI and high‑throughput workloads. Design and architect streaming and batch data ingestion pipelines optimized for AI/ML workflows, enabling efficient data preprocessing, feature ingestion, and model training at scale. Architect and enforce secure GPU and compute isolation mechanisms, utilizing Kubernetes primitives such as RBAC, namespace isolation, and network policies to ensure multi‑tenant security, governance, and compliance. Evaluate, benchmark, and qualify emerging AI hardware platforms and software frameworks, conducting performance, scalability, and cost‑efficiency assessments to inform technology adoption decisions. Mentor engineers in AI infra best practices, observability, and capacity management Define the reference architecture for enterprise-wide AI adoption. Understanding on Sovereign AI Qualifications & Experience B. Tech/B.E. in Computer Science, Artificial Intelligence, Data Science, or related discipline; M. Tech/MS preferred 12+ years in infrastructure/cloud engineering, with 4+ years focused purely on AI/ML systems. Deep expertise in GPU cluster management, distributed compute, and container orchestration. Hands-on experience with Kubernetes for AI workloads, GPU scheduling, and Ray/Kubeflow pipelines. Basic Understanding of LLM training, fine-tuning, quantization, and model optimization. Certifications Required: NVIDIA Certified Associate – AI Infrastructure NVIDIA Professional Certification for AI Networking and AI Infrastructure Certified Kubernetes Administrator Cloud Certification (AWS, Azure, GCP) How You’ll Grow At HCLTech, we provide continuous opportunities for you to discover your spark and grow through meaningful, hands-on experiences. We encourage you to collaborate across diverse teams, engage in vendor interactions, and build strong professional networks that br
Key Responsibilities
2. To develop platform specific architecture solutions.
3. To act as an SME in guiding the team in delivering high quality delivery solutions adhering to client requirements/policies.
4. To effectively respond to RFPs.
5. To provide cost and pricing data, recovery principles, patterns and usage.
6. To effectively translate client requirements into technical solutions
7. To identity new opportunities for PaaS/SaaS Solutions across the cloud service providers space