Publications

Journal article

Greenbury S, Barahona M, Johnston I, 2020,

HyperTraPS: Inferring probabilistic patterns of trait acquisition in evolutionary and disease progression pathways

, Cell Systems, Vol: 10, Pages: 39-51, ISSN: 2405-4712

The explosion of data throughout the biomedical sciences provides unprecedented opportunities to learn about the dynamics of evolution and disease progression, but harnessing these large and diverse datasets remains challenging. Here, we describe a highly generalisable statistical platform to infer the dynamic pathways by which many, potentially interacting, discrete traits are acquired or lost over time in biomedical systems. The platform uses HyperTraPS (hypercubic transition path sampling) to learn progression pathways from cross-sectional, longitudinal, or phylogenetically-linked data with unprecedented efficiency, readily distinguishing multiple competing pathways, and identifying the most parsimonious mechanisms underlying given observations. Its Bayesian structure quantifies uncertainty in pathway structure and allows interpretable predictions of behaviours, such as which symptom a patient will acquire next. We exploit the model’s topology to provide visualisation tools for intuitive assessment of multiple, variable pathways. We apply the method to ovarian cancer progression and the evolution of multidrug resistance in tuberculosis, demonstrating its power to reveal previously undetected dynamic pathways.

Journal article

Liu Z, Barahona M, 2020,

Graph-based data clustering via multiscale community detection

, Applied Network Science, Vol: 5, Pages: 1-20, ISSN: 2364-8228

We present a graph-theoretical approach to data clustering, which combines the creation of a graph from the data with Markov Stability, a multiscale community detection framework. We show how the multiscale capabilities of the method allow the estimation of the number of clusters, as well as alleviating the sensitivity to the parameters in graph construction. We use both synthetic and benchmark real datasets to compare and evaluate several graph construction methods and clustering algorithms, and show that multiscale graph-based clustering achieves improved performance compared to popular clustering methods without the need to set externally the number of clusters.

Journal article

Tonn MK, Thomas P, Barahona M, Oyarzún DAet al., 2020,

Computation of Single-Cell Metabolite Distributions Using Mixture Models.

, Front Cell Dev Biol, Vol: 8, ISSN: 2296-634X

Metabolic heterogeneity is widely recognized as the next challenge in our understanding of non-genetic variation. A growing body of evidence suggests that metabolic heterogeneity may result from the inherent stochasticity of intracellular events. However, metabolism has been traditionally viewed as a purely deterministic process, on the basis that highly abundant metabolites tend to filter out stochastic phenomena. Here we bridge this gap with a general method for prediction of metabolite distributions across single cells. By exploiting the separation of time scales between enzyme expression and enzyme kinetics, our method produces estimates for metabolite distributions without the lengthy stochastic simulations that would be typically required for large metabolic models. The metabolite distributions take the form of Gaussian mixture models that are directly computable from single-cell expression data and standard deterministic models for metabolic pathways. The proposed mixture models provide a systematic method to predict the impact of biochemical parameters on metabolite distributions. Our method lays the groundwork for identifying the molecular processes that shape metabolic heterogeneity and its functional implications in disease.

Journal article

Hodges M, Yaliraki SN, Barahona M, 2019,

Edge-based formulation of elastic network models

, Physical Review Research, Pages: 033211-033211

We present an edge-based framework for the study of geometric elastic networkmodels to model mechanical interactions in physical systems. We use aformulation in the edge space, instead of the usual node-centric approach, tocharacterise edge fluctuations of geometric networks defined in d- dimensionalspace and define the edge mechanical embeddedness, an edge mechanicalsusceptibility measuring the force felt on each edge given a force applied onthe whole system. We further show that this formulation can be directly relatedto the infinitesimal rigidity of the network, which additionally permits three-and four-centre forces to be included in the network description. We exemplifythe approach in protein systems, at both the residue and atomistic levels ofdescription.

Journal article

McGrath T, Spreckley E, Rodriguez A, Viscomi C, Alamshah A, Akalestou E, Murphy K, Jones Net al., 2019,

The homeostatic dynamics of feeding behaviour identify novel mechanisms of anorectic agents

, PLoS Biology, Vol: 17, Pages: 1-30, ISSN: 1544-9173

Better understanding of feeding behaviour will be vital in reducing obesity and metabolic syndrome, but we lack a standard model that capturesthe complexity of feeding behaviour. We construct an accurate stochasticmodel of rodent feeding at the bout level in order to perform quantitativebehavioural analysis. Analysing the different effects on feeding behaviour ofPYY3-36, lithium chloride, GLP-1 and leptin shows the precise behaviouralchanges caused by each anorectic agent. Our analysis demonstrates that thechanges in feeding behaviour evoked by the anorectic agents investigated donot mimic the behaviour of well-fed animals, and that the intermeal intervalis influenced by fullness. We show how robust homeostatic control of feedingthwarts attempts to reduce food intake, and how this might be overcome. Insilico experiments suggest that introducing a minimum intermeal interval ormodulating upper gut emptying can be as effective as anorectic drug administration.

Journal article

Latorre-Pellicer A, Lechuga-Vieco AV, Johnston IG, Hämäläinen RH, Pellico J, Justo-Méndez R, Fernández-Toro JM, Clavería C, Guaras A, Sierra R, Llop J, Torres M, Criado LM, Suomalainen A, Jones NS, Ruíz-Cabello J, Enríquez JAet al., 2019,

Regulation of mother-to-offspring transmission of mtDNA heteroplasmy

, Cell Metabolism, Vol: 30, Pages: 1120-1130.e5, ISSN: 1550-4131

mtDNA is present in multiple copies in each cell derived from the expansions of those in the oocyte. Heteroplasmy, more than one mtDNA variant, may be generated by mutagenesis, paternal mtDNA leakage, and novel medical technologies aiming to prevent inheritance of mtDNA-linked diseases. Heteroplasmy phenotypic impact remains poorly understood. Mouse studies led to contradictory models of random drift or haplotype selection for mother-to-offspring transmission of mtDNA heteroplasmy. Here, we show that mtDNA heteroplasmy affects embryo metabolism, cell fitness, and induced pluripotent stem cell (iPSC) generation. Thus, genetic and pharmacological interventions affecting oxidative phosphorylation (OXPHOS) modify competition among mtDNA haplotypes during oocyte development and/or at early embryonic stages. We show that heteroplasmy behavior can fall on a spectrum from random drift to strong selection, depending on mito-nuclear interactions and metabolic factors. Understanding heteroplasmy dynamics and its mechanisms provide novel knowledge of a fundamental biological process and enhance our ability to mitigate risks in clinical applications affecting mtDNA transmission.

Book chapter

Schaub MT, Delvenne J-C, Lambiotte R, Barahona Met al., 2019,

Structured networks and coarse-grained descriptions: a dynamical perspective

, Advances in Network Clustering and Blockmodeling, Editors: Doreian, Batagelj, Ferligoj, Publisher: John Wiley and Sons, Ltd, Pages: 333-361, ISBN: 9781119224709

This chapter discusses the interplay between structure and dynamics in complex networks. Given a particular network with an endowed dynamics, our goal is to find partitions aligned with the dynamical process acting on top of the network. We thus aim to gain a reduced description of the system that takes into account both its structure and dynamics. In the first part, we introduce the general mathematical setup for the types of dynamics we consider throughout the chapter. We provide two guiding examples, namely consensus dynamics and diffusion processes (random walks), motivating their connection to social network analysis, and provide a brief discussion on the general dynamical framework and its possible extensions. In the second part, we focus on the influence of graph structure on the dynamics taking place on the network, focusing on three concepts that allow us to gain insight into this notion. First, we describe how time scale separation can appear in the dynamics on a network as a consequence of graph structure. Second, we discuss how the presence of particular symmetries in the network give rise to invariant dynamical subspaces that can be precisely described by graph partitions. Third, we show how this dynamical viewpoint can be extended to study dynamics on networks with signed edges, which allow us to discuss connections to concepts in social network analysis, such as structural balance. In the third part, we discuss how to use dynamical processes unfolding on the network to detect meaningful network substructures. We then show how such dynamical measures can be related to seemingly different algorithm for community detection and coarse-graining proposed in the literature. We conclude with a brief summary and highlight interesting open future directions.

Journal article

Lubba CH, Sethi SS, Knaute P, Schultz SR, Fulcher BD, Jones NSet al., 2019,

catch22: CAnonical time-series CHaracteristics

, Data Mining and Knowledge Discovery, Vol: 33, Pages: 1821-1852, ISSN: 1384-5810

Capturing the dynamical properties of time series concisely as interpretable feature vectors can enable efficient clustering and classification for time-series applications across science and industry. Selecting an appropriate feature-based representation of time series for a given application can be achieved through systematic comparison across a comprehensive time-series feature library, such as those in the hctsa toolbox. However, this approach is computationally expensive and involves evaluating many similar features, limiting the widespread adoption of feature-based representations of time series for real-world applications. In this work, we introduce a method to infer small sets of time-series features that (i) exhibit strong classification performance across a given collection of time-series problems, and (ii) are minimally redundant. Applying our method to a set of 93 time-series classification datasets (containing over 147,000 time series) and using a filtered version of the hctsa feature library (4791 features), we introduce a set of 22 CAnonical Time-series CHaracteristics, catch22, tailored to the dynamics typically encountered in time-series data-mining tasks. This dimensionality reduction, from 4791 to 22, is associated with an approximately 1000-fold reduction in computation time and near linear scaling with time-series length, despite an average reduction in classification accuracy of just 7%. catch22 captures a diverse and interpretable signature of time series in terms of their properties, including linear and non-linear autocorrelation, successive differences, value distributions and outliers, and fluctuation scaling properties. We provide an efficient implementation of catch22, accessible from many programming environments, that facilitates feature-based time-series analysis for scientific, industrial, financial and medical applications using a common language of interpretable time-series properties.

Journal article

Peach R, Yaliraki S, Lefevre D, Barahona Met al., 2019,

Data-driven unsupervised clustering of online learner behaviour

, npj Science of Learning, Vol: 4, ISSN: 2056-7936

The widespread adoption of online courses opens opportunities for analysing learner behaviour and optimising web-based learning adapted to observed usage. Here we introduce a mathematical framework for the analysis of time series of online learner engagement, which allows the identification of clusters of learners with similar online temporal behaviour directly from the raw data without prescribing a priori subjective reference behaviours. The method uses a dynamic time warping kernel to create a pairwise similarity between time series of learner actions, and combines it with an unsupervised multiscale graph clustering algorithm to identify groups of learners with similar temporal behaviour. To showcase our approach, we analyse task completion data from a cohort of learners taking an online post-graduate degree at Imperial Business School. Our analysis reveals clusters of learners with statistically distinct patterns of engagement, from distributed to massed learning, with different levels of regularity, adherence to pre-planned course structure and task completion. The approach also reveals outlier learners with highly sporadic behaviour. A posteriori comparison against student performance shows that, whereas high performing learners are spread across clusters with diverse temporal engagement, low performers are located significantly in the massed learning cluster, and our unsupervised clustering identifies low performers more accurately than common machine learning classification methods trained on temporal statistics of the data. Finally, we test the applicability of the method by analysing two additional datasets: a different cohort of the same course, and time series of different format from another university.

Journal article

Kuntz Nussio J, Thomas P, Stan GB, Barahona Met al., 2019,

Bounding the stationary distributions of the chemical master equation via mathematical programming

, Journal of Chemical Physics, Vol: 151, ISSN: 0021-9606

The stochastic dynamics of biochemical networks are usually modelled with the chemical master equation (CME). The stationary distributions of CMEs are seldom solvable analytically, and numerical methods typically produce estimates with uncontrolled errors. Here, we introduce mathematical programming approaches that yield approximations of these distributions with computable error bounds which enable the verification of their accuracy. First, we use semidefinite programming to compute increasingly tighter upper and lower bounds on the moments of the stationary distributions for networks with rational propensities. Second, we use these moment bounds to formulate linear programs that yield convergent upper and lower bounds on the stationary distributions themselves, their marginals and stationary averages. The bounds obtained also provide a computational test for the uniqueness of the distribution. In the unique case, the bounds form an approximation of the stationary distribution with a computable bound on its error. In the non unique case, our approach yields converging approximations of the ergodic distributions. We illustrate our methodology through several biochemical examples taken from the literature: Schl¨ogl’s model for a chemical bifurcation, a two-dimensional toggle switch, a model for bursty gene expression, and a dimerisation model with multiple stationary distributions.

Search or filter publications

Filter by type:

Filter by year:

Results

Search results