2025-06-18 –, February Breakout Room
Citizen-Dataset Lab is a new laboratory with the purpose of creating participatory datasets to shape technologies for and with citizens. In the first edition of this laboratory, we are creating a dataset of misinformation online, involving people to collect untrustworthy news, to annotate the content of news, and discuss how this dataset can be used to shape a software that supports them in their search of information.
Collecting data is a crucial step for the development of AI tools. The most common approach to data creation is scraping contents and behaviours of users-generated contents mainly online, and use crowdsourcing platforms or relatives (colleagues, students, and so on) for the annotation. However, recently, some scholars advanced some alternative methods that involve people at different steps and levels during AI-based service development (Delgado et al., 2023). Following their proposal, we designed the Citizen-Dataset Lab, a laboratory with the purpose of creating datasets with people in order to involve them in the process of dataset creation used for training models, and guarantee that their opinions/decisions are embedded in the annotated dataset. Indeed, current technology is designed and delivered by W.E.I.R.D. researchers without consulting the real needs of citizens or including perspectives coming from disadvantaged communities (Dignum, 2023).
The design of our lab is based on 5 principles: (1) Ensure diversity in dataset creation and annotation; (2) Promote participation, empowering people to decide which are the contents that must be included in a dataset, and explaining to them how technologies use data; (3) Consider subjectivities in the creation of schema of annotation and guidelines; (4) Democratize technologies co-shaping the service with people, collecting impressions about the annotation process and discussing the potential uses of the developed resource; and (5) Embrace Open Source. In fact, datasets and technologies resulting from this laboratory will be open source and available to the community.
The first edition of Citizen-Dataset Lab (February-May) focuses on misinformation. The spreading of misinformation is strongly related with the growth of ideological polarization in the EU (Vasist et al., 2024). Communities of people exposed to opposed media diets become ever more conflictive in the online conversation, with worrying effects in the real world. Therefore, we believe that machines should not classify news as true or false, but support people in grasping the complexity of our media environment and in understanding other views of the world. In this context, Citizen-Dataset Lab will help us to shape the new features of Debunker-Assistant, an open source product for fighting misinformation (Capozzi-Lupi et al., 2023), building a multilingual dataset to make Debunker-Assistant better and fairer in supporting people against misinformation.
References:
Delgado, F., Yang, S., Madaio, M., & Yang, Q. (2023, October). The participatory turn in ai design: Theoretical foundations and the current state of practice. In Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (pp. 1-23).
Dignum, V. (2023). Responsible Artificial Intelligence: Recommendations and Lessons Learned. In: Eke, D.O., Wakunuma, K., Akintoye, S. (eds) Responsible AI in Africa. Social and Cultural Studies of Robots and AI. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-031-08215-3_9
Vasist, P. N., Chatterjee, D., & Krishnan, S. (2024). The polarizing impact of political disinformation and hate speech: a cross-country configural narrative. Information Systems Frontiers, 26(2), 663-688.
Capozzi-Lupi, A. T. E., Cignarella, A. T., Frenda, S., Lai, M., Stranisci, M. A., & Urbinati, A. (2023). Debunker Assistant: a support for detecting online misinformation. In Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023) (Vol. 3596, pp. 1-5).
Marco Antonio Stranisci is a researcher at the University of Turin, Department of Computer Science, and co-founder of aequa-tech, a NLP startup located in Turin. His main research interest is on bias detection through the combination of NLP and SW technologies.