I remember that, when I started studying statistics, I used to criticize some statistical books that insisted in starting the subject of Bayesian regression models by obtaining the posterior distribution of the regression coefficients while assuming the variance to be known. Or the other way around, obtaining the posterior of while assuming known. Of course, my lack of experience at the time prevented me from realizing that these simplified results are very useful and are sometimes building blocks of more complex solutions, as I describe later.

It turns out that, from time to time, I need those simplified solutions to solve a much bigger problem. After recomputing those results for the -th time (sometimes I am somewhere without a basic statistics book at my side) I decided to write it here, so that next time I know exactly where to find it. Besides, it is always good to remember that simple solutions put together can give you quite powerful tools.

** Model specification **

Assume a simple Bayesian regression model where the response vector has dimension and follow a multivariate Gaussian distribution with mean and covariance matrix , where is the design matrix and has dimension , contains the regression coefficients, is the common variance of the observations and is a identity matrix. That is,

The Bayesian model is completed by specifying a prior distribution for the coefficients and for the precision . Lets say and are a priori independent with priors

where and are the mean and covariance matrix of the Gaussian distribution, while and are the shape and rate parameters of the Gamma distribution. , , and are assumed to be known.

** Posterior of assuming to be known **

If we assume to be known, we have that the posterior for , given , is

where

** Posterior of assuming to be known **

If we assume to be known, we have that the posterior for , given , is

where

** Simple, yet useful **

You can argue that assuming or known in the context above is overly simplistic, and I agree with that. However, it turns out that knowing the distributions of and can prove to be extremely useful in more complex situations, where they become building blocks of more complex solutions.

Just to give two examples, both posteriors above are useful when computing full conditionals (the distribution of a unknown given all the other unknowns in the model), which are often necessary when implementing a Markov Chain Monte Carlo (MCMC) scheme [1]. Another related case where this knowledge is helpful is when we want to implement Variational Bayes using a factorized approximation of an intractable posterior distribution (see for example Eq. (4) of my post about Variational Bayes).

**References:**

[1] Robert, C.P., Casella, G. (2004). Monte Carlo statistical methods (Vol. 319). New York: Springer.

**Related posts:**

– Introduction to Variational Bayes