Job Summary
Key Responsibilities
2. Integrate and process large-scale streaming data with Apache Kafka and Spark, enabling real-time model training and inference.
3. Evaluate machine learning models using cross-validation, ROC/AUC, Precision/Recall, F1-score, and confusion matrix to ensure robust predictive performance.
4. Apply advanced NLP techniques with NLTK and SpaCy to extract and preprocess relevant features for forecasting tasks.
5. Optimize model deployment pipelines using Apache Airflow and Hadoop, ensuring efficient workflow orchestration and data management.
6. Collaborate within the development team to troubleshoot, refine, and enhance ML solutions, ensuring adherence to best practices and coding standards.
Skill Requirements
2. Strong Skills In Python Programming, Including Numpy, Pandas, Scikitlearn, Tensorflow, Pytorch, Xgboost, And Lightgbm.
3. Indepth Knowledge Of Distributed Data Processing With Apache Spark And Realtime Data Integration Using Apache Kafka.
4. Solid Understanding Of Ml Model Evaluation Metrics And Techniques, Including Crossvalidation And Performance Optimization.
5. Experience With Workflow Orchestration Tools Such As Apache Airflow And Big Data Platforms Like Hadoop.
6. Advanced Proficiency In Nlp Libraries Including Nltk And Spacy.
Other Requirements
2. Certifications Such As Tensorflow Developer Certificate
3. - Aws Certified Machine Learning � Specialt