Friday, March 7, 2014

An interesting epistemic scoring rule

A forecast p is an assignment of probabilities to events in some space Ω. A proper score is an assignment of a random variable sp to each forecast on that space, with the property that EpspEpsq whenever p is a consistent forecast (one that satisfies the axioms of probability) and q is any other forecast. Here, Ep is expectation with respect to the probability function p. Propriety basically says that if we have a consistent forecast, then by our own lights no other forecast is expected to have a better score. The scores are thought of as penalties or distances from truth—smaller is better.

One thing proper scoring rules have been used for is to argue that our credences should be consistent. For instance, under a simplifying assumption, Predd et al. have basically shown that the proper score for an inconsistent forecast is always dominated (from below) by a proper score for some consistent forecast. The simplifying assumption is that scores are computed for individual events and added.

Now, here is a curious proper score that does not satisfy this simplifying assumption. Suppose we're working with a finite space Ω with n points. Suppose p is consistent. Let m(p) be a point of Ω where p is maximized for a forecast p. (Use any tie-breaking method you like if that point isn't unique.) Then let sp be 0 at m(p) and 1 everywhere else. Then if p and q are consistent, Epsq=1−p(m(q)) (where p(ω)=p({ω})). Since p(m(p))≥p(m(q)) by definition of m, it follows that EpspEpsq. Observe that p(m(p))≥1/n. Thus, Epsp≤1−1/n. Finally, if p is inconsistent, let sp be 1−1/n everywhere. Then s is a proper score.

For consistent forecasts, our s is a best guess score: a forecast's maximum point (with whatever tie breaker one likes) counts as the forecast's "best guess", and we get the perfect score 0 if we guessed right, and we get 1 otherwise. And for inconsistent forecasts, I just assigned a value that makes the score proper and, well, that makes what I am about to say true.

Namely: the above score s does not have the domination property that I talked about earlier. Let q be any inconsistent forecast. Then sq is 1−1/n everywhere. If p is any consistent forecast, however, then sp is 1 at all but one point, and so sp does not dominate sq from below.

Now, our score s is not a strictly proper score (Predd et al. actually work with strictly proper scores): for a strictly proper score s, whenever q differs from p and p is consistent, we will have Epsp<Epsq. But we can make our score strictly proper. Fix a small constant c. Then s+cb, where b is the standard Brier score, will be strictly proper. But if c is small enough, s+cb will also fail to have the domination property.

We should already have been suspicious of the argument for consistency based on proper scores and domination when proper scores were defined: the definition treated consistent forecasts in a special way (i.e., EpspEpsq was only required when p is consistent—of course, it's hard to define Ep when p is inconsistent, so there is some excuse). But now we have even more reason to be suspicious: it is only some proper scores that have the property that scores of inconsistent forecasts are dominated by scores of consistent ones. Now, if we had some philosophical reason to think that the right way to score forecasts is by adding up scores for individual events, this would be better. But I don't know of such a philosophical reason.

No comments: