Invited Talks

Picture of Emily Bender

Emily M. Bender

Professor, University of Washington - Website

Data Statements: Empowering Ethical Practice and Accountability through Dataset Documentation

Joint work with Batya Friedman and Angelina McMillan-Major
Dataset documentation provides information about why a dataset was constructed, how items were selected for inclusion, and which communities or other groups of people it can be understood as representative of. The Data Statements toolkit (Bender & Friedman 2018, Bender et al 2021, McMillan-Major et al 2023) is designed to support language dataset creators in particular in developing appropriate documentation for their datasets. In this talk I will present an overview of the Data Statements toolkit, how it has been developed, and why the information it elicits is key to ethical practice in language dataset development and use.


Emily M. Bender is a Professor of Linguistics and an Adjunct Professor in the School of Computer Science and the Information School at the University of Washington, where she has been on the faculty since 2003. Her research interests include multilingual grammar engineering, computational semantics, and the societal impacts of language technology. She is the co-author of recent influential papers such as Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data (ACL 2020) and On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜(FAcct 2021). In 2022 she was elected as a Fellow of the American Association for the Advancement of Science (AAAS).

Picture of Lilian Wanzare

Lilian Wanzare

Lecturer, Maseno University, Kenya - Website

Lilian Wanzare is a Lecturer at the department of Computer Science, School of Computing and Informatics, Maseno University, Kenya. She obtained her PhD in Computational Linguistics from Saarland University Germany. Her research interests touch on knowledge acquisition, supervised and semi-supervised learning and natural language processing, particularly collecting and annotating data for building NLP tools for low-resource languages. She is a consultant providing solutions to various natural language processing challenges and is passionate about building machine learning solutions to solve local problems.

Playing Catch-Up: Towards Linguistic Annotation in Low-Resource Languages


Linguistic annotation plays a vital role in numerous Natural Language Processing (NLP) applications, facilitating tasks like Part-of-Speech (POS) tagging, parsing, and Named Entity Recognition (NER), among others. Substantial efforts have been dedicated to constructing annotated corpora for high resource languages, leading to the development of state-of-the-art models. However, a significant challenge lies in acquiring sufficient data for low-resource languages, where limited linguistic resources hinder NLP advancements. In this talk, we explore the requirements and potential challenges for establishing comparable tools for NLP in low-resource languages, considering the unavailability of annotated data for even fundamental NLP applications. It raises the question of whether alternative approaches are needed or if researchers in low-resource language settings will forever struggle to keep up with their counterparts.

Picture of Anne Lauscher

Anne Lauscher

Professor, University of Hamburg, Germany - Website

Why Fostering Inclusive NLP needs More Inclusion

Advanced natural language processing systems, such as open-domain conversational AI systems, now reached the broad public and support millions of daily users for a variety of tasks. However, they still exhibit critical shortcomings, in particular, they are exclusive to speakers of potentially marginalized groups underrepresented in our (annotated) training data. Truly societal beneficial language technology should serve everybody. Thus, we are working towards more sociodemographically inclusive NLP, for instance, via data set collection and creation. Still, also here we are creating new problems and dealing with double-edged swords. In this talk, I will outline current issues relating to cultural and subcultural exclusion, and then try to provide a critique of inclusive data set creation practices. Lacking a final conclusion, I will point to interdisciplinary research and mixed-method approaches as beneficial tools for our ongoing epistemic quest.


Anne Lauscher (ˈanə ˈlaʊ̯ʃɐ, she/her) is Associate Professor of Data Science at the University of Hamburg, where her research group investigates Conversational Artificial Intelligence (AI) systems with a focus on fair, inclusive, and sustainable communication. Before, she was a Postdoctoral Researcher in the Natural Language Processing group at Bocconi University (Milan, Italy) where she was working on introducing demographic factors into language processing systems with the aim of improving algorithmic performance and system fairness. She obtained her Ph.D. from the Data and Web Science group at the University of Mannheim (Germany), where her research focused on the interplay between language representations and computational argumentation. During her studies, she conducted research internships at and became an independent research contractor for Grammarly Inc. (New York City, U.S.) and for the Allen Institute for Artificial Intelligence (Seattle, U.S.). Her research gets regularly published at international top-tier Natural Language Processing (e.g., ACL, EMNLP, etc.) and Artificial Intelligence (e.g., AAAI) venues and has been recognized with multiple awards. In 2022, she was nominated for the Dissertation Award of the German Informatics Society and named as one of the "100 Brilliant Women in AI Ethics for 2023".