13:30 – 14:30 – Dr. Hélène Ruffieux (University of Cambridge)
Title: A Bayesian hierarchical approach to joint functional principal component analysis for complex sampling designs
Abstract: The analysis of multivariate functional curves has the potential to yield important scientific discoveries in domains such as healthcare, medicine, economics and social sciences. However it is common for real-world data to be both sparse and irregularly sampled, and this introduces important challenges for the current functional data methodology. In this talk I will present a Bayesian hierarchical modelling framework for performing functional principal component analysis (FPCA) on p > 1 related variables observed longitudinally in complex sampling design settings. Our approach relies on a multivariate extension of the Karhunen-Loève theorem, where functional principal component scores are shared across all p variables. These shared scores permit flexible pooling of information and offer a parsimonious representation of the data based on the main modes of joint variation of the variables. We treat all quantities from the Karhunen-Loève expansion as unknown and estimate them jointly, using fast variational message passing inference. This work was motivated by a COVID-19 study aimed to characterise patient heterogeneity in disease dynamics. Our joint framework allowed reconstructing blood biomarker trajectories at the patient level and revealed coordinated dynamics across the immune, inflammatory and metabolic systems, which were associated with survival and long-COVID symptoms up to one year post disease onset. This helped clarify how different biological pathways coordinate the organismal response to infection over time and drive systemic recovery. Our method is implemented in the R package BayesFPCA.
15:00 – 16:00 – Prof. Nick Whiteley (University of Bristol)
Title: Statistical exploration of the Manifold Hypothesis
Abstract: The Manifold Hypothesis is a widely held tenet of Machine Learning which asserts that nominally high-dimensional data are in fact concentrated near a low-dimensional manifold, embedded in high-dimensional space. This phenomenon is observed empirically in many real world situations, has led to development of a wide range of statistical methods in the last few decades, and has been suggested as a key factor in the success of modern AI technologies. We show that rich manifold structure in data can emerge from a generic and remarkably simple statistical model — the Latent Metric Model — via elementary concepts such as latent variables, correlation and stationarity. This establishes a general statistical explanation for why the Manifold Hypothesis seems to hold in so many situations. Informed by the Latent Metric Model we derive procedures to discover and interpret the geometry of high-dimensional data, and explore hypotheses about the true data generating mechanism. These procedures operate under minimal assumptions and make use of well known, scaleable graph-analytic algorithms.
This is joint work with Annie Gray and Patrick Rubin-Delanchy.
Refreshments available between 14:30 – 15:00