Speakers: Vladimir Gligorijevic and Richard A Bonneau (Flatiron Institute, New York)
Title: Large Scale Machine Learning Methods for Predicting Protein Functions from Sequence, Structure and Networks
Abstract: Due to limitations of existing experimental methods for determining protein functions and the high cost of experiments, the vast majority of proteins across many organisms remain unannotated. Developing ML methods for combining large-scale genome-wide heterogeneous data to extract useful protein feature representations for function prediction thus remains a key problem in biology. We will review a few of our recent deep learning-based methods for predicting function from various data types, including protein sequences, structures and protein-protein interaction networks. We will then present our recent integrative method, deepNF (deep network fusion) that integrates different protein-protein interaction networks using auto-encoders to construct a shared protein feature representation indicative of protein function. In the second part of the talk, we will focus more on integrative methods for predicting function from protein sequence and structure. Lastly, we will present a method based on graph convolutional networks (GCNs) which extracts features from protein contact maps and sequences while learning to predict function. We will discuss the performance of GCN in predicting functions of experimentally determined structures from PDB and how we are planning on applying this method on Rosetta-predicted structures.