Job Summary
To apply advanced data science techniques, build scalable models, and generate actionable insights that drive business performance, optimize processes, and support data-driven decision-making. Job Title: Speech AI Engineer Role summary We are looking for a senior hands-on expert who can take speech systems from raw audio to reliable production features. You will build and improve core speech capabilities such as ASR, TTS, voice conversion, and speech-to-speech workflows, and you will also own the engineering work that makes them fast, scalable, and measurable in the real world. This role is a strong fit if you enjoy the full stack of speech AI: signal processing intuition, modern deep learning, decoding and streaming constraints, and practical deployment trade-offs. What you will own 1) Speech modeling that ships • Build, train, and iterate on ASR models for real-world conditions such as conversational speech, accents, noise, and far-field audio, with strong offline and online evaluation discipline. • Develop and improve TTS systems that are natural, low-latency, and stable on speaker identity and prosody, with production-quality inference constraints. • Work on voice conversion and accent conversion when needed, preserving intelligibility, naturalness, and speaker identity in streaming settings. 2) Decoder and streaming engineering • Design and implement decoding stacks using proven libraries and patterns, including Kaldi and OpenFST, and features like custom vocabulary injection, language model rescoring, and beam search tuning. • Build streaming inference systems with strict latency budgets and predictable behavior at scale, including monitoring and continuous improvement loops. 3) Speech analysis and speech intelligence • Deliver speech analytics building blocks such as VAD, diarization, speaker recognition, and quality analytics that improve end-to-end product outcomes. • Design robust evaluation harnesses and datasets for real user scenarios, including domain adaptation and behavior tuning across use cases. 4) GenAI and LLM integration for voice experiences • Integrate speech components into LLM-based systems, including cascaded ASR plus LLM plus TTS pipelines, and drive joint optimization where it materially improves product quality. • Build or extend speech generation capabilities including voice cloning, controllable prosody, and modern generative architectures where relevant to the roadmap. 5) Production deployment and operational excellence • Own end-to-end delivery: prototyping, ablations, training, evaluation, optimization, deployment, and post-launch monitoring. • Partner closely with product and platform teams to integrate models into real-time systems and maintain reliability, uptime, and quality under production traffic. Required qualifications • 6+ years building production-grade speech or audio ML systems, or equivalent depth through research plus shipped production impact. • Strong programming ability in Python, plus comfort in C or C++ for performance-critical components. • Proven expertise in deep learning for speech (PyTorch or TensorFlow) and practical model training and serving. • Solid fundamentals in speech and audio, including signal processing concepts and real-world acoustic variability. • Experience deploying models into real-time or high-throughput systems, including evaluation, scalability, and production reliability. Strongly preferred.
Key Responsibilities
2. Design, test, and refine algorithms for data processing, feature extraction, and pattern detection.
3. Perform exploratory data analysis, mining, and visualization to identify trends and business opportunities.
4. Collaborate with business and technology teams to translate data insights into strategic solutions.
5. Utilize deep learning, statistical modeling, and big data tools to enhance analytical capabilities.
6. Ensure data quality, governance, and best practices in model development and deployment.
7. Optimize and automate data workflows to improve efficiency and scalability.
8. Stay updated with emerging data science technologies and best practices to drive innovation.