An alternative Bayesian neural network prior, that we might believe a little more - but that sadly doesn’t work very well.
An overview of some recent work, published in ICLR 2024, where we estimate the uncertainty and marginal likelihoods in LLMs using Bayesian LoRA adapters. We focus on the fine-tuning setting, and scale our method to LLMs using a Laplace approximation with low-rank K-FAC.
A motivation of the Hessian from an optimisation perspective (and the related Generalised Gauss-Newton / Fisher Information Matrix), an introduction to Kronecker-factored approximate curvature, and applications of the curvature in machine learning.
Some intuitions and visualisations of vector-Jacobian products and Jacobian-vector products, to help you avoid confusing the two again.
A note on fine-tuning transformer language models on synthetically generated training data.
An overview of approximation methods and computational techniques for scaling Gaussian processes to large, high-dimensional datasets; covering training conditionals and variational approximations.
A review of the basic methods behind Bayesian linear regression, as well as modern techniques for approximate inference, dealing with non-conjugate priors and scaling this model to large datasets.
An overview of some recently proposed methods for using diffusion models with discrete data, and some associated challenges.
An explanation of the recently published Bayesian Flow Networks and a PyTorch implementation.
Or “PL3E” for short; a versatile likelihood defined by a product of piecewise-linear log-likelihood functions.
A self-contained introduction for computer scientists, physicists, mathmos and anyone else interested in making predictions from data.
A gentle overview of some essential Gaussian identities and derivations.