Event image

Abstract: High-dimensional data are very commonly collected in an amazing variety of application areas.  As the sample size is often modest relative to the dimension of the data, it is necessary to develop dimensionality reduction methods.  Most of these methods are based on assumptions of sparsity and/or linearity.  For example, PCA reduces the original p-dimensional data to coordinates on a k-dimensional affine subspace, with k much less than p typically.  There is a literature on non-linear extensions of PCA; such approaches are commonly referred to as “manifold learning” – supposing that the lower-dimensional subspace consists of one or more smooth Riemannian manifolds.  Almost all such approaches rely on the assumption of local linearity, which can be justified based on the manifolds being approximately Euclidean in local neighborhoods.  The key problem with such approaches is that a very large number of linear pieces are needed if the manifold(s) have moderate to high curvature; for example, think of approximating a low-dimensional sphere with planes.  To address this problem, we propose a broad new dictionary for approximating manifolds based on pieces of spheres or spherelets.  We provide theory showing dramatic reductions in covering numbers needed to produce a particular small MSE relative to locally linear methods.  We develop a simple spherical PCA algorithm for implementation, and show very substantial gains in performance relative to the state-of-the-art on toy and non-toy examples.  We additionally develop a Bayesian model that characterizes uncertainty in subspace learning, along with a simple and efficient MCMC algorithm for implementation.  

 CV:  David Dunson is Arts and Sciences Distinguished Professor of Statistical Science, Mathematics, and Electrical & Computer Engineering at Duke University.

His research focuses on Bayesian statistical theory and methods motivated by high-dimensional and complex applications.  A particular emphasis is on dimensionality reduction, scalable inference algorithms, latent factor models, and nonparametric approaches, particularly for high-dimensional, dynamic and multimodal data, including images, functions, shapes and other complex objects. 

His work involves inter-disciplinary thinking at the intersection of statistics, mathematics and computer science. Motivation comes from applications in epidemiology, environmental health, neurosciences, genetics, fertility and other settings (music, fine arts, humanities). 

Dr. Dunson is a fellow of the American Statistical Association and of the Institute of Mathematical Statistics.  He is winner of the 2007 Mortimer Spiegelman Award for the top public health statistician under 41, the 2010 Myrto Lefkopoulou Distinguished Lectureship at Harvard University, the 2010 COPSS Presidents’ Award for the top statistician under 41, and the 2012 Youden Award for interlaboratory testing methods.