# Bayesian linear regression model – simple, yet useful results

I remember that, when I started studying statistics, I used to criticize some statistical books that insisted in starting the subject of Bayesian regression models by obtaining the posterior distribution of the regression coefficients ${\beta}$ while assuming the variance ${\sigma^2}$ to be known. Or the other way around, obtaining the posterior of ${\sigma^2}$ while assuming ${\beta}$ known. Of course, my lack of experience at the time prevented me from realizing that these simplified results are very useful and are sometimes building blocks of more complex solutions, as I describe later.

It turns out that, from time to time, I need those simplified solutions to solve a much bigger problem. After recomputing those results for the ${n}$-th time (sometimes I am somewhere without a basic statistics book at my side) I decided to write it here, so that next time I know exactly where to find it. Besides, it is always good to remember that simple solutions put together can give you quite powerful tools.

Model specification

Assume a simple Bayesian regression model where the response vector ${y}$ has dimension ${n \times 1}$ and follow a multivariate Gaussian distribution with mean ${X\beta}$ and covariance matrix ${\sigma^2 I}$, where ${X}$ is the design matrix and has dimension ${n \times p}$, ${\beta}$ contains the ${p}$ regression coefficients, ${\sigma^2}$ is the common variance of the observations and ${I}$ is a ${n \times n}$ identity matrix. That is,

$\displaystyle y \sim N(X\beta, \sigma^2 I).$

The Bayesian model is completed by specifying a prior distribution for the coefficients ${\beta}$ and for the precision ${\phi = \sigma ^{-2}}$. Lets say ${\beta}$ and ${\phi}$ are a priori independent with priors

$\displaystyle \beta \sim N(\mu_0, \Sigma_0) \quad \text{and} \quad \phi \sim \text{Gamma}(a_0, b_0),$

where ${\mu_0}$ and ${\Sigma_0}$ are the mean and covariance matrix of the Gaussian distribution, while ${a_0}$ and ${b_0}$ are the shape and rate parameters of the Gamma distribution. ${\mu_0}$, ${\Sigma_0}$, ${a_0}$ and ${b_0}$ are assumed to be known.

Posterior of ${\beta}$ assuming ${\phi}$ to be known

If we assume ${\phi}$ to be known, we have that the posterior for ${\beta}$, given ${\phi}$, is

$\displaystyle \beta|y, \phi \sim N(\mu_1, \Sigma_1),$

where

$\displaystyle \Sigma_1 = (\Sigma_0^{-1} + \phi X^T X)^{-1} \quad \text{and} \quad \mu_1 = \Sigma_1(\Sigma_0^{-1} \mu_0 + \phi X^T y).$

Posterior of ${\phi}$ assuming ${\beta}$ to be known

If we assume ${\beta}$ to be known, we have that the posterior for ${\phi}$, given ${\beta}$, is

$\displaystyle \phi|y, \beta \sim \text{Gamma}(a_1, b_1),$

where

$\displaystyle a_1 = a_0 + \frac{n}{2} \quad \text{and} \quad b_1 = b_0 + 0.5(y - X\beta)^T(y - X\beta).$

Simple, yet useful

You can argue that assuming ${\beta}$ or ${\phi}$ known in the context above is overly simplistic, and I agree with that. However, it turns out that knowing the distributions of ${\beta|y, \phi}$ and ${\phi|y, \beta}$ can prove to be extremely useful in more complex situations, where they become building blocks of more complex solutions.

Just to give two examples, both posteriors above are useful when computing full conditionals (the distribution of a unknown given all the other unknowns in the model), which are often necessary when implementing a Markov Chain Monte Carlo (MCMC) scheme [1]. Another related case where this knowledge is helpful is when we want to implement Variational Bayes using a factorized approximation of an intractable posterior distribution (see for example Eq. (4) of my post about Variational Bayes).

References:

[1] Robert, C.P., Casella, G. (2004). Monte Carlo statistical methods (Vol. 319). New York: Springer.

Related posts: