Job Summary
- MLOps Expertise: Strong knowledge of ML lifecycle management, pipeline automation, and monitoring tools.
- Cloud Platforms: Hands-on experience with AWS (SageMaker, Lambda, ECS/EKS), Snowflake, and related services.
- Programming: Proficiency in Python; familiarity with ML frameworks (PyTorch, TensorFlow).
- Containerization & Orchestration: Experience with Docker and Kubernetes for scalable deployments.
- CI/CD Tools: Knowledge of GitHub Actions, Jenkins, or similar tools for automated workflows.
- Data Engineering: Ability to work with SQL and integrate data from multiple sources.
- Pipeline Development & Automation: Design and implement automated ML pipelines for model training, testing, deployment, and monitoring.
- Model Deployment & Monitoring: Deploy AI/ML models into production environments using containerization and orchestration tools; ensure performance and reliability.
- Cloud Infrastructure Management: Configure and optimize cloud resources (AWS SageMaker, S3, Bedrock) for scalable ML workflows.
- Data Integration: Collaborate with Data scientist to streamline data ingestion and transformation for model readiness.
- CI/CD for ML: Implement continuous integration and delivery practices tailored for ML workflows.
- Performance Optimization: Monitor model performance, retrain as needed, and manage versioning for reproducibility.
- Collaboration: Work closely with Data Scientists to translate experimental models into production-ready solutions.
Key Responsibilities
2. Apply DevOps practices with Jenkins, GitLab CI/CD, CircleCI, and GitHub Actions to streamline CI/CD for machine learning workflows and monitor pipeline health.
3. Utilize infrastructure-as-code tools such as Terraform and AWS CloudFormation to provision and manage scalable cloud resources for ML workloads.
4. Integrate monitoring solutions like Prometheus, Grafana, ELK Stack, and Fluentd to track model performance, system metrics, and log analytics in production environments.
5. Ensure process compliance by using Git, GitHub, GitLab, and Bitbucket for version control and code management within the team.
6. Participate in technical discussions and feasibility studies to evaluate technical alternatives and support architecture best practices for ML Ops solutions.
7. Prepare and submit status reports to highlight progress, minimize risks, and support project closure activities.
Skill Requirements
2. Solid Understanding Of Devops Tools Such As Jenkins, Gitlab Ci/Cd, Circleci, And Github Actions For Workflow Automation.
3. Solid Experience With Python For Scripting, Data Processing, And Ml Pipeline Development.
4. Solid Knowledge Of Infrastructureascode Tools Like Terraform And Aws Cloudformation For Cloud Resource Management.
5. Solid Skills In Monitoring And Logging Tools Including Prometheus, Grafana, Elk Stack, And Fluentd.
6. Solid Familiarity With Version Control Systems Such As Git, Github, Gitlab, And Bitbucket.
7. Solid Ability To Participate In Technical Discussions And Support Process Compliance Within The Team.
Other Requirements
2. AWS Certified DevOps Engineer
3. - Google Professional Machine Learning Enginee