Job Summary
Key Responsibilities
2. Integrate and process large-scale streaming data with Apache Kafka and Spark, enabling real-time feature engineering and data transformation for ML pipelines.
3. Evaluate machine learning models with cross-validation, ROC/AUC, Precision/Recall, F1-score, and Confusion Matrix to ensure optimal performance and reliability.
4. Apply NLP techniques using NLTK and SpaCy to preprocess and analyze textual data for forecasting applications.
5. Optimize model training and deployment workflows using Apache Airflow and Hadoop, maintaining efficiency and scalability in production environments.
6. Collaborate within the development team to advocate and implement coding standards and best practices in Python and ML model development.
7. Prepare technical documentation and status reports to communicate progress, risks, and mitigation strategies for assigned modules.
Skill Requirements
2. Solid Experience With Python, Numpy, Pandas, Scikitlearn, Tensorflow, Pytorch, Xgboost, And Lightgbm For Model Development And Evaluation.
3. Solid Understanding Of Time Series Analysis And Forecasting Methodologies.
4. Solid Skills In Apache Spark And Kafka For Distributed Data Processing And Streaming Analytics.
5. Solid Knowledge Of Ml Model Evaluation Metrics And Techniques, Including Crossvalidation And Statistical Analysis.
6. Solid Ability To Apply Nlp Frameworks Such As Nltk And Spacy For Text Data Processing.
Other Requirements
2. Tensorflow Developer Certificate
3. - Apache Spark Developer Certificatio