Analysing speech interruptions can help create more human-like AI chatbots

Robots talking to eachother and interrupting eachother

Data scientists at Imperial have developed an audio dataset of speech interruptions to enhance chatbots and gain insights into human linguistics.

In a new study, published in Data, Mr Daniel Doyle from Imperial’s Department of Computing and Dr Ovidiu Serban from the Data Science Institute explored the notoriously ambiguous area of human speech interruptions. Daniel carried out this research as part of his MSc studies in the Department of Computing.

As defined in their study, an interruption is an instance where an interrupting party intentionally attempts to take over a turn of the conversation from an interruptee and in, doing so, creates an overlap in speech.

Until now, there has been a lack of publicly available audio datasets specifically for studying interruptions, which has made it difficult for researchers to develop effective models.

Through identifying and manually annotating over 500 interruptions from a recording of a group meeting, the pair of researchers were able to build a comprehensive dataset that includes both audio files and transcripts.

This work not only contributes to the field of computational linguistics but also opens new avenues for improving chatbot interactions by enabling computer models to better recognise and respond to interruptions in real-time, thereby creating more natural conversational experiences.

A new audio dataset

Audio-based datasets are historically lacking due to the complexities in transcribing speech, a lack of high-quality data and limited access to resources.

Yet audio datasets are important for a variety of reasons; from improving machine learning models and chatbot interactions, to advancing research in communication and human interaction.

In particular, previous methods for detecting interruptions in conversations were either based on artificial data or were slow and inefficient. Previous techniques took a long time to analyse speech, which made it difficult to quickly identify interruptions during a conversation.

To address these challenges and to compensate for the lack of publicly-available data on speech interruptions, the team at Imperial created a new dataset for interruption classification using the Group Affect and Performance (GAP) dataset curated from the University of the Fraser Valley in Canada.

The GAP dataset and consists of 28 group meetings, totalling 252 minutes of conversational audio. The nature of these discussions welcomed frequent interruptions which made the setup of the GAP dataset ideal for creating a new dataset for interruption classification.

From this broader dataset, the researchers extracted 200 manually annotated interruptions from a total of 355 overlapping utterances, categorizing them into true and false interruptions.

True and false interruptions

The dataset created by Mr Doyle and Dr Serban helps in creating chatbots capable of distinguishing between interruptions, backchannels, and background noise and categorising overlapping speech into two classes: true interruptions and false interruptions.

False interruptions refer to overlapping speech that does not constitute as genuine interruptions. These may include backchannel responses where the listener is engaging without attempting to take over the conversation.

Backchannels, which include affirmations like "agreed" or "mhmm," are responses from listeners that do not interrupt the speaker. By accurately identifying these distinctions, the dataset allows for more natural pauses in conversation, leading to interactions that feel more human-like.

“By publishing this new dataset, we are starting to democratise the research opportunities in human-computer interaction and human activity recognition in a field that has always been restricted to large tech companies working on voice assistants and voice models. This is the first openly available and, hopefully, not the last dataset to support this research area.” Dr Ovidiu Serban Research Fellow, Data Science Institute

Whereas true interruptions are instances where a speaker intentionally takes over the conversation, creating an overlap in speech with the current speaker. These interruptions are characterised by the intent to dominate the conversation or change the topic, often leading to a disruption in the flow of dialogue.

According to Dr Serban, “By focusing on the intention behind the interruptions and distinguishing them between ‘true’ and ‘false’, we were able to provide a precise definition for an interruption – something that was previously difficult to clearly define.”

He added: “By publishing this new dataset, we are starting to democratise the research opportunities in human-computer interaction and human activity recognition in a field that has always been restricted to large tech companies working on voice assistants and voice models. This is the first openly available and, hopefully, not the last dataset to support this research area.”.

‘Interruption Audio & Transcript: Derived from Group Affect and Performance Dataset’ by Doyle and Serban, published on 31 August 2024 in Data.