Researchers discussing at the DSI Squared event

We’re kicking off the new academic year with a seminar from LSE Data Science Institute’s Dr Blake Miller about the Social and Ethical Implications of Data Scarcity and Data Drift in Large Language Models as part of our ongoing Unsolved Problems Seminar Series for the DSI Squared partnership.

The Unsolved Problems Seminar Series aims to foster innovations by bridging the gap between social sciences, computer sciences and STEM subjects through presenting unsolved problems and crowdsourcing solutions from experts across these fields.

Date: 05 October 2023
Time: 12:30-13:30
Location: Data Science Institute, Imperial College London, William Penney Laboratory, South Kensington Campus, SW7 2AZ

Speaker: Dr Blake Miller 

Abstract: 
In this project, I investigate the effects of behavioral changes in data producers/providers due to the swift introduction and widespread adoption of powerful large language model (LLM) tools. I examine the impact of their use on the quality and quantity of data produced on platforms where these models are commonly trained (e.g., Wikipedia, StackOverflow, Quora, etc.). I discuss the potential challenges arising from data drift and domain mismatch resulting from this behavioral shift, specifically concerning safety, content moderation, and the factual accuracy of LLM outputs. This project aims to highlight the extent of behavior change among content creators and emphasizes the potential risks of LLMs becoming less reliable due to of scarcity of non-synthetic data. 

Registration is now closed. Add event to calendar View map
See all events