14:00 – 15:00 – Kelly Zhang (Imperial College London)

Title: Posterior Sampling via Autoregressive Generation

Abstract: Conventionally trained neural networks excel at prediction, but often struggle to model uncertainty in their own predictions. We explore this challenge in a meta-learning bandit decision-making problem for news recommendation; this setting requires decision-making algorithms to incorporate pre-trained language models to process text data for the best performance. We present a scalable approach to Bayesian uncertainty quantification by posing it as a problem of autoregressive generative modeling future rewards. First, we use historical data on previously released news articles to pre-train a generative model to predict sequences of future potential rewards. At inference time, our algorithm makes decisions based on limited previous rewards and autoregressively generated future rewards. Far from a heuristic, we synthesize insights from the literature to show our method is a novel implementation of Thompson (posterior) sampling, a prominent bandit algorithm. We prove our pretraining loss directly controls online decision-making performance, and we demonstrate our framework on a news recommendation task where we integrate end-to-end fine-tuning of a pretrained language model to process news article headline text to improve performance.

Refreshments available between 15:00 – 15:30, Huxley Common Room (HXLY 549)

Getting here