Posterior predictive checks

The main idea behind posterior predictive checking is the notion that, if the model fits, then replicated data generated under the model should look similar to observed data.

Replicating data

Assume you have a model {M} with unknown parameters {\theta}. You fit {M} to data {y} and obtain the posterior distribution {\pi(\theta|y)}. Given that

\displaystyle \pi(y_{rep}|y) = \int \pi(y_{rep}|\theta)\pi(\theta|y)d\theta

we can simulate {K} sets of replicated datasets from the fitted model by simulating {K} elements {\{\theta^{(1)}, ..., \theta^{(K)}\}} from the joint posterior distribution {\pi(\theta|y)} and then for each {\theta^{(i)}, i = 1,...,K}, we simulate a dataset {y_{rep}^{(i)}} from the likelihood {\pi(y|\theta)}.

Notice that under simulation-based techniques to approximate posterior distributions, such as MCMC, we already have draws from the posterior distribution, and the only extra work is in simulating {y_{rep}^{(i)}} from {\pi(y|\theta)} for each draw from the posterior distribution.

Test quantities and tail probabilities

We measure the discrepancy between model and data by defining test quantities (or discrepancy measures) {T(y, \theta)} to compute aspects of the data we want to check. Then, the posterior predictive p-value (or Bayesian p-value), which is defined as the probability that the replicated data could be more extreme than the observed data, as measured by the test quantity {T} is given by:

\displaystyle p_B = Pr(T(y_{rep}, \theta) \geq T(y, \theta)|y)

In contrast to the classical approach, the test statistic used to compute the Bayesian p-value can depend not only on data {y}, but also on the unknown parameters {\theta}. Hence, it does not require special methods for dealing with nuisance parameters. Also, a (bayesian) p-value is a posterior probability and can therefore be interpreted directly, although not as Pr(model is true|data).

Choice of test statistics

One important point in applied statistics is that one model can be adequate for some purposes and inadequate for others, so it is important to chose test statistics that check relevant characteristics of the model for a given application. One should protect against errors in the model that would lead to bad consequences regarding the objective under study. The choice of appropriate test statistics are therefore highly dependent on the specific problem at hand.

References:

[1] Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003). Bayesian data analysis. CRC press (chapter 6).

Advertisements

Optimizing R with Multi-threaded OpenBLAS

You can obtain a huge speed-up if you configure R to use a Multi-threaded OpenBLAS library instead of the reference BLAS library that comes with R and Ubuntu.

In case you have missed, I would like to point out that Nathan VanHoudnos have showed in his blog how to easily install ATLAS and OpenBLAS and how to switch between them. Also, Michael Rutter have showed how to install the multi-threaded version of the OpenBLAS in Ubuntu.

I have done those changes mentioned above and obtained a huge speed-up according to the benchmark script provided by them. Funny that sometimes we spend way too much time optimizing algorithms for specific problems and forget to look at those potentially significant software optimizations.

LaTeX and WordPress.com

I use LaTeX2WP to type wordpress.com posts that require mathematical symbols. It is a (python) program developed by Luca Trevisan that converts a LaTeX file into something that is ready to be cut and pasted into WordPress.

Usage

You can download LaTeX2WP following these instructions. Once you have a .tex document with latex features compatible with LaTeX2WP (see below) you just need to type

python latex2wp.py yourFile.tex

in your command-line, where latex2wp.py is one of the files you need to download. This will generate yourFile.html which you can cut and paste in your wordpress.com text editor. Done!

LaTeX Features

It supports many features of latex. The ones I use the most are:

  • Link to URLs without snapshot preview: \hrefnosnap{http://www.google.com}{link_text}.
  • Numbered and unnumbered equations via \begin{equation} ... \end{equation}, $$ ... $$ and \[ ... \].
  • eqnarray* is supported, but not eqnarray.
  • Refer to equations and theorems with \ref, \eqref and \label commands.
  • The theorem-like environments theorem, lemma, proposition, remark, corollary, example and exercise are defined, as is the proof environment.
  • {\bf text} for bold text and {\em text} for italic.

Limitations and work arounds

Obviously, it has some limitations. One that annoys me a little is that, as far as I know, it doesn’t work well for bibliographic references.

However, when LaTeX2WP cannot do what you want, you can always put HTML code in the scope of a \ifblog . . . \fi. The conversion will output such HTML code verbatim, and it will be ignored in the LaTeX preview. Conversely, anything in the scope of \iftex . . . \fi is compiled in the LaTeX preview but skipped in the conversion.

Another work around to use when some latex’s feature you want is not available for use in LaTeX2WP is to use the wordpress own syntax for latex equations. Another path that has helped me a couple of times is to use one of the many online latex converter (like this one for example) and then upload the image yourself.

Further Reading

You can find more detailed example of LaTeX2WP capabilities by looking at this .tex and .pdf example provided by Luca Trevisan, as well as his blog post about it.

Bayesian linear regression model – simple, yet useful results

I remember that, when I started studying statistics, I used to criticize some statistical books that insisted in starting the subject of Bayesian regression models by obtaining the posterior distribution of the regression coefficients {\beta} while assuming the variance {\sigma^2} to be known. Or the other way around, obtaining the posterior of {\sigma^2} while assuming {\beta} known. Of course, my lack of experience at the time prevented me from realizing that these simplified results are very useful and are sometimes building blocks of more complex solutions, as I describe later.

It turns out that, from time to time, I need those simplified solutions to solve a much bigger problem. After recomputing those results for the {n}-th time (sometimes I am somewhere without a basic statistics book at my side) I decided to write it here, so that next time I know exactly where to find it. Besides, it is always good to remember that simple solutions put together can give you quite powerful tools.

Model specification

Assume a simple Bayesian regression model where the response vector {y} has dimension {n \times 1} and follow a multivariate Gaussian distribution with mean {X\beta} and covariance matrix {\sigma^2 I}, where {X} is the design matrix and has dimension {n \times p}, {\beta} contains the {p} regression coefficients, {\sigma^2} is the common variance of the observations and {I} is a {n \times n} identity matrix. That is,

\displaystyle y \sim N(X\beta, \sigma^2 I).

The Bayesian model is completed by specifying a prior distribution for the coefficients {\beta} and for the precision {\phi = \sigma ^{-2}}. Lets say {\beta} and {\phi} are a priori independent with priors

\displaystyle \beta \sim N(\mu_0, \Sigma_0) \quad \text{and} \quad \phi \sim \text{Gamma}(a_0, b_0),

where {\mu_0} and {\Sigma_0} are the mean and covariance matrix of the Gaussian distribution, while {a_0} and {b_0} are the shape and rate parameters of the Gamma distribution. {\mu_0}, {\Sigma_0}, {a_0} and {b_0} are assumed to be known.

Posterior of {\beta} assuming {\phi} to be known

If we assume {\phi} to be known, we have that the posterior for {\beta}, given {\phi}, is

\displaystyle \beta|y, \phi \sim N(\mu_1, \Sigma_1),

where

\displaystyle \Sigma_1 = (\Sigma_0^{-1} + \phi X^T X)^{-1} \quad \text{and} \quad \mu_1 = \Sigma_1(\Sigma_0^{-1} \mu_0 + \phi X^T y).

Posterior of {\phi} assuming {\beta} to be known

If we assume {\beta} to be known, we have that the posterior for {\phi}, given {\beta}, is

\displaystyle \phi|y, \beta \sim \text{Gamma}(a_1, b_1),

where

\displaystyle a_1 = a_0 + \frac{n}{2} \quad \text{and} \quad b_1 = b_0 + 0.5(y - X\beta)^T(y - X\beta).

Simple, yet useful

You can argue that assuming {\beta} or {\phi} known in the context above is overly simplistic, and I agree with that. However, it turns out that knowing the distributions of {\beta|y, \phi} and {\phi|y, \beta} can prove to be extremely useful in more complex situations, where they become building blocks of more complex solutions.

Just to give two examples, both posteriors above are useful when computing full conditionals (the distribution of a unknown given all the other unknowns in the model), which are often necessary when implementing a Markov Chain Monte Carlo (MCMC) scheme [1]. Another related case where this knowledge is helpful is when we want to implement Variational Bayes using a factorized approximation of an intractable posterior distribution (see for example Eq. (4) of my post about Variational Bayes).

References:

[1] Robert, C.P., Casella, G. (2004). Monte Carlo statistical methods (Vol. 319). New York: Springer.

Related posts:

Introduction to Variational Bayes