Fast Bayesian Inference with INLA

Update: As of March 2014 I am no longer a Research Fellow at NTNU, see here.

I am currently a research fellow and 4th year PhD candidate within the INLA group.  If you deal with Bayesian models and have never heard about INLA, I sincerely think you should spend a small portion of your time to at least know what it is. If you have heard about it before, you know how nice it is and I would like to ask you to help us spread the word. After all, this can really help some applied researches that work with time-consuming modeling tasks, such as those that involve spatial and spatio-temporal modeling.

INLA is a deterministic method and provides a faster and more accurate alternative to simulation-based MCMC schemes within the class of latent Gaussian models. I am not talking about a slightly improvement over commonly used methods. I am talking about reducing your computational time by orders of magnitude. It is not uncommon to see cases where the computational time got reduced from weeks to hours or from days to minutes.

inla_EPIL_exampleJust to give an example, the Figure above shows the comparison between the posterior marginals approximated by INLA (solid blue line) and MCMC (histograms) for some parameters of the EPIL example. I know the quality of the Figure is not great but those pictures have been computed quite a while ago and I just didn’t have the time to generate those histograms again (remember, I am a 4th year PhD candidate, thesis to finish and all, rs). The approximations returned by INLA are basically instantaneous, while MCMC only match INLA’s accuracy after 16 minutes of computation time (using JAGS). You can find the (very simple) INLA R code for this example in the INLA website.

As I said before, for large datasets and/or complex models, INLA can be orders of magnitude faster than MCMC. At this point, if you are eager to try INLA I suggest you to download and install the R package INLA and to take a look at the worked out examples in the INLA website. If you stay with me for the weeks to come I plan to write more details and useful information about INLA and its R package. If you are not familiar with the concept of latent Gaussian models, I would like to point out that the following models belong to this broad class:

  • Dynamic linear models
  • Stochastic volatility
  • Generalized linear (mixed) models
  • Generalized additive (mixed) models
  • Spline smoothing 
  • Semi-parametric regression
  • Space-varying (semi-parametric) regression models
  • Disease mapping
  • Log-Gaussian Cox-processes
  • Model-based geostatistics
  • Spatio-temporal models
  • Survival analysis
  • +++

If you have any doubts regarding INLA there is also a very helpful INLA discussion forum.

Bayesian linear regression model – simple, yet useful results

I remember that, when I started studying statistics, I used to criticize some statistical books that insisted in starting the subject of Bayesian regression models by obtaining the posterior distribution of the regression coefficients {\beta} while assuming the variance {\sigma^2} to be known. Or the other way around, obtaining the posterior of {\sigma^2} while assuming {\beta} known. Of course, my lack of experience at the time prevented me from realizing that these simplified results are very useful and are sometimes building blocks of more complex solutions, as I describe later.

It turns out that, from time to time, I need those simplified solutions to solve a much bigger problem. After recomputing those results for the {n}-th time (sometimes I am somewhere without a basic statistics book at my side) I decided to write it here, so that next time I know exactly where to find it. Besides, it is always good to remember that simple solutions put together can give you quite powerful tools.

Model specification

Assume a simple Bayesian regression model where the response vector {y} has dimension {n \times 1} and follow a multivariate Gaussian distribution with mean {X\beta} and covariance matrix {\sigma^2 I}, where {X} is the design matrix and has dimension {n \times p}, {\beta} contains the {p} regression coefficients, {\sigma^2} is the common variance of the observations and {I} is a {n \times n} identity matrix. That is,

\displaystyle y \sim N(X\beta, \sigma^2 I).

The Bayesian model is completed by specifying a prior distribution for the coefficients {\beta} and for the precision {\phi = \sigma ^{-2}}. Lets say {\beta} and {\phi} are a priori independent with priors

\displaystyle \beta \sim N(\mu_0, \Sigma_0) \quad \text{and} \quad \phi \sim \text{Gamma}(a_0, b_0),

where {\mu_0} and {\Sigma_0} are the mean and covariance matrix of the Gaussian distribution, while {a_0} and {b_0} are the shape and rate parameters of the Gamma distribution. {\mu_0}, {\Sigma_0}, {a_0} and {b_0} are assumed to be known.

Posterior of {\beta} assuming {\phi} to be known

If we assume {\phi} to be known, we have that the posterior for {\beta}, given {\phi}, is

\displaystyle \beta|y, \phi \sim N(\mu_1, \Sigma_1),

where

\displaystyle \Sigma_1 = (\Sigma_0^{-1} + \phi X^T X)^{-1} \quad \text{and} \quad \mu_1 = \Sigma_1(\Sigma_0^{-1} \mu_0 + \phi X^T y).

Posterior of {\phi} assuming {\beta} to be known

If we assume {\beta} to be known, we have that the posterior for {\phi}, given {\beta}, is

\displaystyle \phi|y, \beta \sim \text{Gamma}(a_1, b_1),

where

\displaystyle a_1 = a_0 + \frac{n}{2} \quad \text{and} \quad b_1 = b_0 + 0.5(y - X\beta)^T(y - X\beta).

Simple, yet useful

You can argue that assuming {\beta} or {\phi} known in the context above is overly simplistic, and I agree with that. However, it turns out that knowing the distributions of {\beta|y, \phi} and {\phi|y, \beta} can prove to be extremely useful in more complex situations, where they become building blocks of more complex solutions.

Just to give two examples, both posteriors above are useful when computing full conditionals (the distribution of a unknown given all the other unknowns in the model), which are often necessary when implementing a Markov Chain Monte Carlo (MCMC) scheme [1]. Another related case where this knowledge is helpful is when we want to implement Variational Bayes using a factorized approximation of an intractable posterior distribution (see for example Eq. (4) of my post about Variational Bayes).

References:

[1] Robert, C.P., Casella, G. (2004). Monte Carlo statistical methods (Vol. 319). New York: Springer.

Related posts:

Introduction to Variational Bayes