Optimising testing for disease surveillance with machine learning

20 December 2024

Computer image of network showing nodes and clusters

A new machine learning informed strategy could support public health leaders to design better disease surveillance during a disease outbreak.

A new paper, co-authored by Dr Elizaveta Semenova, in Proceedings of the American Academy of Sciences may help optimise testing strategies for infectious disease surveillance.

Alongside Dr Semenova, the study included researchers from the University of Oxford’s Pandemic Sciences Institute, Biology and Computer Science departments, as well as colleagues from the Oxford Martin Programme on Pandemic Genomics, Royal Veterinary College and University of California, Los Angeles.

When epidemics and pandemics occur, screening the population for infection is essential to understand how disease is spreading.

Testing resources, however, are always finite, and questions on how to allocate tests to maximise the information gained about disease distributions remain difficult.

The study proposes a novel machine learning strategy (“policy”), Selection by Local-Entropy (LE), to guide the selection of testing sites. When tested in a range of simulated outbreak scenarios, LE mostly outperformed other testing policies considered by the authors.

Dr Elizaveta Semenova, Lecturer at the School of Public Health, Imperial College London, said: "By applying active learning techniques to disease surveillance, we can prioritise testing in a way that maximises the insight gained about the outbreak’s spread, even under tight resource constraints. This approach aims to help public health officials better understand where infections are occurring and allocate their limited testing capacity more effectively."

The framework created by this study will allow researchers and policymakers to more adaptively design surveillance systems for infection disease.

Methods

Active Learning (AL) is an iterative form of machine learning that aims to maximise a model’s performance by strategically selecting the most informative data points that need labelling.

The new study tested eight AL policies, including LE, exploring the performance of different test allocation strategies in simulated outbreak scenarios.

Professor Moritz Kraemer, Professor of Epidemiology & Data Science at the Department of Biology and Pandemic Sciences Institute at the University of Oxford, said: “Data and robust understanding of the transmission process early in epidemics is essential for effective public health policies. Our study provides a step towards more rational implementation of public health policies.”

The infection status of an initial node in an outbreak model was revealed; then AL policies were iteratively deployed to determine which nodes needed to be labelled (in this case tested) as infected or not infected. The goal of the exercise was to maximise the model’s predictive performance while using the least amount of labelled data.

In real outbreak scenarios, this would mean testing locations in a way that would minimise the resources used while still providing an accurate picture of how disease is spreading.

Adapting testing approaches

The newly developed LE is an uncertainty-based policy, meaning it selects nodes for testing based on the uncertainty of the outbreak model’s predictions. The more uncertain predictions are for a certain node, the more informative testing is likely to be.

Unlike other policies of this type, LE considers the uncertainty of the nodes it selects for testing as well as of their connected nodes.

Dr Mengyan Zhang, Research Associate at the University of Oxford Department of Computer Science, said: “Our active learning strategy is designed to effectively explore local uncertainties within the mobility network. By leveraging Selection by Local-Entropy, we address a balance between exploitation and exploration, which enables more efficient and targeted testing given limited testing resources.”

In real-life outbreaks, the deployment of different testing policies should depend on resources and budget as well as outbreak structure and stage. When resources are constrained, the study argues, frequent exploratory testing yields better results by testing locations that are at the periphery of the outbreak and for which predictions are most uncertain.

Joseph Tsui, DPhil student in the Department of Biology and Oxford Martin School Programme on Pandemic Genomics, said: “Our work opens up exciting new avenues for future research. By building on the framework we've developed, we hope to explore how surveillance policies can be tailored for specific pathogens with unique transmission characteristics, such as varying incubation periods or different modes of transmission.

“Ultimately, our goal is to develop a framework that will provide actionable insights and recommendations in real-time, enabling policymakers to respond more effectively during emerging outbreaks.”

Read the study in Proceedings of the National Academy of Sciences.

Reporter

Jack Stewart

School of Public Health

Email: jack.stewart@imperial.ac.uk
Articles by this author