Imperial News

New computer model helps assess privacy risks of networked data collection

by Gemma Ralton

Privacy experts from Imperial have created a new computational tool to estimate the number of people at risk from data collected on social networks.

Researchers from Imperial’s Data Science Institute including former PhD student Dr Florimond Houssiau and Associate Professor of Applied Mathematics and Computer Science Dr Yves-Alexandre de Montjoye created a new computer model to help determine the reach of ‘networked data’ – data shared between multiple devices that are connected to each other through a network such as the internet.

In the study, published in Patterns, the privacy experts validated their model by applying it to real-world examples including the Cambridge Analytica scandal in 2018, the surveillance of mobile phone networks and close proximity tracking in large cities.

The model confirmed that Cambridge Analytica collected 68.0M Facebook profiles from 270,000 compromised accounts – demonstrating the need for better tools to evaluate the impact of data collection on privacy.

“Our research provides a critical first step in understanding the reach of modern data collection methods, specifically shedding light on how network effects can have a strong detrimental impact on our privacy." Dr Yves-Alexandre de Montjoye Associate Professor of Applied Mathematics and Computer Science

Dr Yves-Alexandre de Montjoye said, “Our research provides a critical first step in understanding the reach of modern data collection methods, specifically shedding light on how network effects can have a strong detrimental impact on our privacy.

“By developing this new model based on node-based intrusions, we hope to inspire further research into effective privacy protection mechanisms and policies that can mitigate the risks associated with networked data collection."

The research was conducted in partnership with Piotr Sapiezynski from Northeastern University and Laura Radaelli and Erez Shmueli from the Department of Industrial Engineering at Tele Aviv University. 

More connected than ever before

We are today more connected than ever before - the average worldwide degree of separation dramatically shrank from 6 steps in 1969 to 3.5 steps today.

However, this connectedness can impact our right to privacy - often data collected intrinsically relates to more than one person, for example a text sent between two people or close-proximity data collected through Bluetooth. 

From a data protection perspective, this means that even though data about only a handful of people are collected, information about many more people might be included in the dataset.

Modern data protection laws such as the EU General Protection Regulation help protect an individual’s privacy, ensuring that the data collected are relevant for the purposes of processing – something known as proportionality.

In this paper, researchers propose a new model to evaluate the reach and therefore proportionality of modern data collections and attacks.

Node-intrusions on social networks: learning from the Cambridge Analytica scandal

The model relies on specific techniques called node- and edge- observability to understand the reach of modern data collections. Here, ‘nodes’ refer to  to individual devices or people in a network that the data relates to, while an ‘edge’ refers to the connections or links in a network between two nodes.

In simple terms, node-observability measures how much data an attacker can collect about individual nodes, and edge-observability measures how much data an attacker can collect about the relationships between nodes.

The 2018 Cambridge Analytica incident was the first major case of a node-observability attack based on node-intrusions on social networks.

“You could be very careful with your privacy, but if your friends are not, then you are vulnerable. Network effects make us vulnerable to node-based intrusions on even a small fraction of the network.” Dr Florimond Houssiau Lead Author

By using their model, the privacy experts were able to independently quantify the node observability of the Facebook network used in the Cambridge Analytica attack, demonstrating that the company would have had access to 68.0M profiles from the only 270,000 individuals that installed the illegitimate app.

Dr Houssiau explains: “You could be very careful with your privacy, but if your friends are not, then you are vulnerable. Network effects make us vulnerable to node-based intrusions on even a small fraction of the network.”

Moving forward

“Moving forward, we hope this work can help evaluate the scope of data collection mechanisms and technologies and ensure their proportionality.”

Overall, the researchers hope their work will contribute to a better understanding of how networked data collection works and its impact on privacy, ultimately leading to more effective privacy protection mechanisms and policies in the upcoming years. 


-


'Detrimental network effects in privacy: A graph-theoretic model for node-based intrusions' by Houssiau et al., published on 13 January 2023 in Patterns.