Hidden data could reveal if an AI model was trained on copyrighted material

by Caroline Brogan, Gemma Ralton

31 July 2024

Copyright traps developed at Imperial College London create hidden data to let content creators later check if their work was used to train AI models

Modern AI models, such as Large Language Models (LLMs), require vast amounts of text, images and other forms of content from the internet to achieve their impressive capabilities.

However, these data used to train the AI models can be used on “shaky” legal grounds, potentially disregarding licence and copyright restrictions. Until now it has been difficult to know whether a specific piece of text has been used to train a model.

Now, a team of privacy experts from Imperial’s Computational Privacy Group has demonstrated a way to prove if copyholders’ work has been used to train a model. This technique was inspired by similar copyright traps used throughout history, where map makers would introduce fictitious towns to catch out erroneous copies.

The team, including Matthieu Meeus, Igor Shilov, Manuel Faysse and Dr Yves-Alexandre de Montjoye from Imperial’s Department of Computing presented their research, which has not been peer-reviewed, at the International Conference on Machine Learning in Vienna in July 2024.

The code to generate and detect traps is currently available on GitHub, but the team also intends to build a tool that allows people to generate and insert copyright traps themselves.

An age-old technique

The idea proposed in this work is similar to traps that have been used by copyright holders throughout history – for example creating fake locations on a map or fake words in a dictionary.

“Taking inspiration from the map makers of the early 20th century, who put phantom towns on their maps to detect illicit copies, we study how the injection of "copyright traps" - unique fictitious sentences - into the original text enables content detectability in a trained LLM.” Dr Yves-Alexandre de Montjoye Associate Professor from Imperial’s Data Science Institute

Lead researcher Dr Yves-Alexandre de Montjoye, also from Imperial’s Data Science Institute, said: “Taking inspiration from the map makers of the early 20th century, who put phantom towns on their maps to detect illicit copies, we study how the injection of "copyright traps" - unique fictitious sentences - into the original text enables content detectability in a trained LLM.”

First, the content owner would repeat a copyright trap multiple times across their collection of documents, such as all the BBC travel articles. The trap would typically not be visible to readers but be copied by scrappers, and bots used by LLM companies to collect content.

Then, when an LLM developer uses the data to train a model, the data owner would be able to confidently prove the fact of unauthorised training by observing irregularities in the model's outputs.

The proposal is best suited for online publishers, who could hide the copyright trap sentence in their page, such that it stays invisible to the reader, yet still picked up by a data scraper.

Measuring ‘surprise’

To create the traps, the researchers constructed unique sequences of text that were intended to be distinctive and memorable. They varied the length of the sequences to see how this affected memorisation.

They then fed a model the sequences they generated and looked at whether it flagged them as new or not by measuring the “perplexity”. In this context, perplexity is a measure of how predictable or surprising a sequence of text is to an LLM.

If a text sequence has low perplexity, it means the model finds it easy to predict or understand, while high perplexity indicates that the sequence is more complex or unexpected.

When the researchers injected copyright traps into the training data, they discovered that sequences with higher perplexity were more likely to be memorised by the model. This means that if the traps were designed to be less predictable, they had a better chance of being recognised by the model later on.

The researchers found that longer sequences of text, when repeated many times significantly improved the detectability of the traps compared to shorter sequences.

Better transparency tools for LLM training

To verify the validity of the approach, they partnered with a team in France, training a “truly bilingual” English-French real-world LLM, injecting various copyright traps into the training set of a real-world state-of-the-art parameter-efficient language model.

The researchers believe the success of their experiments enables better transparency tools for the field of LLM training.

Co-author Igor Shilov, also from the Data Science Institute, said: “AI companies are increasingly reluctant to share information about their training data. While the training data composition for older products such as GPT-3 and LLaMA is publicly known, this is no longer the case for the more recent models GPT-4 and LLaMA-2.

“LLM developers have little incentive to be open about their training procedure, leading to a concerning lack of transparency (and thus fair profit sharing), making it more important than ever to have tools to inspect what went into the training process.”

Co-author Matthieu Meeus, also from the Data Science Institute, added: “For a future where AI is built and used in a responsible way, and where content creators are compensated fairly, transparency in AI is paramount. Our hope is that this work on copyright traps contributes towards a sustainable solution.”

‘Copyright Traps for Large Language Models’ by Meeus et al. presented on 4 June 2024 as part of Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024.

Image sourced from Unsplash.