It is widely accepted that forecast about uncertain events are probabilistic in nature, taking the form of probabilistic distributions over future quantities or events. However, due to practical reasons, quite often we need to report point estimates instead of the more informative predictive distribution. Examples of point forecast are mean or quantiles of the predictive distribution. [1] and [2] are two excellent papers by Tilmann Gneiting that presents important guidelines on how to make and evaluate point forecasts.

** Scoring function **

Competing forecasts are compared and assessed by means of an error measure, such as the average of the scoring function over forecast cases:

where there are forecast cases with corresponding point forecasts, , and verifying observations, . Commonly used scoring functions are squared error and absolute error .

** Guidelines to effective point forecasting **

The main message of the papers by Gneiting mentioned above is that effective point forecasting depends on guidance or directives, which can be given in one of two complementary ways:

- Giving the scoring function to the forecaster, so that she can issue an optimal point forecast by applying the so-called Bayes rule,

where the random variable is distributed according to the forecaster’s predictive distribution, . For example, if the scoring function is the squared error, the solution of Eq. (1) is known to be the mean of , while if the scoring function is the absolute error, the solution is given by the median of .

- An alternative to disclosing the scoring function is to request a specific functional of the forecaster’s predictive distribution, such as the mean or a quantile, and to apply any scoring function that is
*consistent*with the functional. The papers give precise definitions on what consistency means in this context, but in lay terms it means that for a given functional of the predictive distribution, we should evaluate this functional with a scoring function that would issue this functional as an optimal point forecast. A functional is*elicitable*if there exists a scoring function that is strictly consistent for it. Not every functional is elicitable, see for example the case of Conditional Value-at-Risk in [1], which is widely used in finance.

** Simple but important **

The guidelines above are simple to follow and yet very important. Failure to comply with them may cause hard to detect inconsistencies. Suppose you have a model which is known to give the best -quantile estimate of the predictive distribution of a given random variable . In addition, you have other models that are known to give sub-optimal -quantile estimates of . If you use the square error scoring function, which is not consistent with -quantile, to evaluate the -quantile estimates you might end up picking one the sub-optimal models, since you are using the wrong metric to assess the -quantile estimates.

The -asymmetric piecewise linear scoring function is consistent with -quantile estimates. So, the proper way to evaluate the merits of the -quantile estimates between different models would be to use

with .

Section of [1] gives a nice example of problems that can happen when we decide not to follow those guidelines to evaluate point forecast.

** Conclusion **

It is important to follow specific guidelines while making/evaluating point forecasts. It avoids inconsistencies when comparing point forecasts from different models and/or from different forecasters. Although important, it is not hard to see people not following them. I myself have seen more than once the use of the median of the predictive distribution as a point forecast while the assessment of its accuracy being carried out with the squared error scoring function.

**References:**

[1] Gneiting, T. (2011). Making and evaluating point forecasts. Journal of the American Statistical Association.

[2] Gneiting, T. (2011) Quantiles as optimal point forecasts. International Journal of Forecasting.

**Related posts:**

– Bias-variance trade-off in model selection

– AIC, Kullback-Leibler and a more general Information Criterion

– Model selection and model assessment according to (Hastie and Tibshirani, 2009) – Part [1/3]