Statistical Information Theory

Module aims

What does fitting a machine learning model have to do with copying a file over a computer network? The answer, you may be surprised to hear, is quite a lot. The fundamental principles of information theory were originally devised to study optimal communication over information channels, but have since found widespread applications across areas of data science and computing. This module will introduce the basics of information theory as devised by Claude Shannon in 1948, and then delve into its deep connections with statistics and machine learning. Through mathematical and computational exercises, the module presents information as a unifying theory linking computing, statistics, and geomety, and provides crucial theoretical background for students wishing to pursue a career in data science or machine learning.

Learning outcomes

Upon successful completion of this module you will be able to:
- Explain the key properties of information metrics in terms of communication principles.
- Calculate these metrics in common probability distributions.
- Compare different metrics and coding schemes on real-world data.
- Explain the mathematical connections between data transmission and model fitting.
- Design principled data analysis plans with appropriate statistical criteria.

Module syllabus

The module will cover a variety of topics in information theory and statistics, including:
- Entropy and the source coding theorem.
- Mutual information and the channel coding theorem.
- Simple coding schemes, such as Huffman and Hamming codes.
- Introduction to information geometry and projections.
- Connections between information theory and maximum likelihood estimation.
- Recent trends in multivariate information theory for data analysis.

Teaching methods

The material will be taught mostly through traditional lectures, backed up by formative unassessed problems designed to reinforce your understanding of the material. There will be one or more assessed coursework exercises, possibly involving practical laboratory-based exercises.
 
An online service will be used as an open discussion forum for the module.

Assessments

A final written exam will contribute most of the marks, with the rest being contributed by coursework exercises.

Written and verbal feedback will be provided throughout the module. Detailed written feedback will be provided on the coursework. Class-wide feedback will be provided after the exam.
 

Reading list

Module leaders

Dr Pedro Mediano