Context and Objectives:
Recent advancements in natural language processing and machine learning have led to the development of powerful large language models (LLMs). These LLMs exhibit exceptional text generation capabilities and have demonstrated the potential for diverse applications. The objective of the project is to design and implement AI models / pipelines capable of automatically converting, extracting and analysing data (speech, text) for specific real world data classification use cases. The application is built on the backbone of a large language model.
Tasks include:
- Processing & conversion of speech datasets (pre-processing, sound capturing, feature extraction)
- Build, train & validate AI models that extract content information from speech/text data by using LLM prompt engineering
- Includes: -topic/intent classification, summarization, speaker identification, dialogue state tracking -applying time series segmentation methods, preferably using LLMs
- Analyse conversation phases and speech & text patterns/features, perform statistical analysis
General requirements:
- Self-motivated scientist/PhD graduate, with a passion for AI related projects on LLMs, natural language processing, speech recognition, data science.
- Hold a PhD/master in a relevant field of AI, natural language processing, data science, speech recognition.
- Knowledge of and experienced with time series analysis methods, LLMs, dialogue state tracking.
- Prior experience in working with deep learning projects, speech recognition software, LLM based applications (eg. conversational agents), data mining
Specific technical requirements:
- Excellent experience, knowledge, and skills in AI models for speech recognition & speech transcription (Whisper), machine learning and data science (time series analysis)
- Excellent experience, knowledge, and hands-on skills in programming languages, particularly Python, C++ -Experienced in LangGraph, LangChain, RAG, Azure, Git, Docker, SQL, Pytorch
Musts:
- Data Scientist
- PHD preferred or AI background
- Speech Recognition - 2-3 years experience
- Deep Learning
- LLMs
Pluses:
- Python
- Deep understanding of LLMs
Contract information :
- Location: Full Remote 40hrs per week - Client based in Belgium
- Status : Open to Freelancers
- LOA: 12 months + potential extension
- Start : ASAP
