Are you passionate about fair and robust Natural Language Processing (NLP), data, and computational social science/sociolinguistics? Join our new ambitious
DataDivers project funded by an ERC Starting grant and help us make NLP models more fair and robust.
Your job The rise of Large Language Models (LLMs) and the availability of massive datasets have sparked a revolution in the field of NLP. However, numerous studies have pointed towards serious flaws: NLP models encode societal biases and show disparate performance across demographic groups. Thus, current models can and do cause real harm when deployed in society.
In the field of NLP, there is a growing recognition that data quality is key to better language models, yet we know surprisingly little about the link between data and model behaviour. In this project, we will develop methods to measure the diversity of NLP datasets, assess the impact of diversity on NLP models, and improve data collection and model training.
As a PhD candidate in our new
DataDivers project, you will join the project team led by
Dr Dong Nguyen. The team will consist of two PhD candidates and two Postdocs.
You will develop innovative methods to measure the diversity of NLP datasets. A major focus will be on measuring the dataset diversity from a sociolinguistic perspective, considering language variation – such as styles and dialects - and combining (socio)linguistic insights with neural language modelling. You will also draw from relevant disciplines, particularly the social sciences, that have developed measurement approaches for diversity. Furthermore, you will carry out experiments to assess the impact of data diversity on NLP models, with a focus on fairness and robustness, and investigate ways to leverage data diversity to improve NLP models!
This position offers you the opportunity to work on fundamental NLP research. As a PhD candidate, you will have the freedom to shape the project according to your own interests. Responsibilities include contributing to teaching activities, such as supervising Bachelor’s and Master’s theses or assisting in labs.