Job Summary
We are looking for an Platform Engineer III with deep expertise in AI/ML system design, large-scale deployment, and advanced model architectures (Vision + Generative AI). The role requires designing scalable architectures, enabling MLOps, and driving adoption of advanced AI across NLP, Vision, and multimodal domains.
Key Responsibilities
Architecture & System Design Design end-to-end AI architectures for NLP, Vision, and multimodal systems. Architect vision-based AI solutions (image classification, object detection, segmentation, video analytics, OCR, 3D vision, multimodal fusion). Define custom neural architectures (CNNs, Transformers, Vision Transformers, ConvNeXt, Swin, SAM, CLIP). Build multimodal AI pipelines combining text, image, video, and speech data. Lead optimization strategies (model compression, distillation, quantization, pruning). Architect real-time vision inference systems for edge and cloud. MLOps & Infrastructure Design automated model lifecycle for vision models: training → validation → deployment → monitoring → retraining. Implement distributed training for large vision models (Detectron2, MMDetection, YOLOv8, Segment Anything, OpenMMLab). Optimize model serving for high-throughput vision tasks using Triton Inference Server, ONNX Runtime, TensorRT. Build benchmarking pipelines for latency, accuracy, and cost trade-offs. Data & Integration Architect image/video data pipelines (preprocessing, augmentation, annotation tools, synthetic data generation). Integrate vision AI systems with enterprise workflows (retail analytics, quality inspection, surveillance, healthcare imaging,
Skill Requirements
Core AI/ML Frameworks: PyTorch, TensorFlow, JAX. Generative AI: LLM fine-tuning (LoRA, PEFT), embeddings, RAG pipelines, vector DBs (Pinecone, FAISS, Milvus, Weaviate). MLOps & Deployment: MLflow, Kubeflow, BentoML, Airflow, Ray, SageMaker, Vertex AI, Azure ML. Computer Vision Detection & Segmentation Models: YOLO (v5–v8), Faster R-CNN, RetinaNet, Mask R-CNN, Detectron2, OpenMMLab, MMDetection, SAM (Segment Anything Model). Vision Transformers (ViT): ViT, DeiT, Swin Transformer, BEiT, ConvNeXt. OCR & Document AI: Tesseract, PaddleOCR, DocTr, LayoutLM, Donut, TrOCR. Video Understanding: SlowFast, TimeSformer, X3D, I3D, action recognition, activity detection. 3D Vision: PointNet, PointNet++, MinkowskiNet, NeRF, depth estimation, SLAM. Multimodal Models: CLIP, BLIP, Flamingo, Kosmos, Gemini-like fusion models. Generative Vision Models: GANs, StyleGAN, Diffusion Models (Stable Diffusion, DALL·E, Imagen), ControlNet. Optimization Tools: TensorRT, OpenVINO, ONNX, CoreML, quantization & pruning libraries. Data Systems & Infra Data Pipelines: Spark, Kafka, Delta Lake, Snowflake. Cloud & Infra: AWS (SageMa