When you have two probability distributions and want to know the difference between them, then you need a way to measure it. Over the years many metrics and distance measures have been developed and used, the most famous one is the Kullback-Leibler-Distance. In a paper in 2012 I had shown that a metric called Earth Mover’s Distance (EMD) shows considerable improvements in detecting differences between distributions. So it was a natural idea for me to try to make use of this measure, when we want to compare two distributions.
So given is a distribution by the model prediction, defined by the ensemble members, and an observation with a non-parametric distribution of its uncertainties. A nowadays standard tool for evaluation of ensemble prediction is CRPS. In this case it is evaluated at which percentile of the probability distribution the deterministic observation can be found. The paper now tries to make use of this tool and extends it by looking at uncertain observations. So effectively, what is done is to measure the distance between two distributions and by normalising it against a reference (e. g. the climate state) a metric distinguishing between a good and a bad prediction can be created.
So how does the EMD work? Well, it effectively measures how much work would be needed to transfer one distribution into another. So when you imagine a distribution as a sand pile, then it measures the minimal amount of fuel a machine would need to push the sand around until it creates the target distribution. This picture is also the one from which the EMD got its name. As a metric it measures the distance precisely and therefore allows to say, when you have two predictions, which one is closer to the observations.
But it is important here to mention, that there are problems with this view. Similar to CRPS, there exist literature, which describe that even with its properties, measures like EMD are potentially to kind to false sharp predictions compared to uniformed ones. In the CRPS case, the distance is squared, so that a longer transport of probability is necessary for a wrong prediction. In my paper I also show the results with this approach as IQD. A squared distance is much less intuitive than a linear one, it is harder to understand for scientists, why they should use this above the others, which leads to hesitant use of these kind of measures. Therefore, it will be necessary in the future to much better describe why the issues occur and develop new pictures to explain everyone, why squaring is the way to go. We also need new ways in general for verification in the future, but on this I will write more on the final post in this series.