When you have two probability distributions and want to know the difference between them, then you need a way to measure it. Over the years many metrics and distance measures have been developed and used, the most famous one is the Kullback-Leibler-Distance. In a paper in 2012 I had shown that a metric called Earth Mover’s Distance (EMD) shows considerable improvements in detecting differences between distributions. So it was a natural idea for me to try to make use of this measure, when we want to compare two distributions.
So given is a distribution by the model prediction, defined by the ensemble members, and an observation with a non-parametric distribution of its uncertainties. A nowadays standard tool for evaluation of ensemble prediction is CRPS. In this case it is evaluated at which percentile of the probability distribution the deterministic observation can be found. The paper now tries to make use of this tool and extends it by looking at uncertain observations. So effectively, what is done is to measure the distance between two distributions and by normalising it against a reference (e. g. the climate state) a metric distinguishing between a good and a bad prediction can be created.
So how does the EMD work? Well, it effectively measures how much work would be needed to transfer one distribution into another. So when you imagine a distribution as a sand pile, then it measures the minimal amount of fuel a machine would need to push the sand around until it creates the target distribution. This picture is also the one from which the EMD got its name. As a metric it measures the distance precisely and therefore allows to say, when you have two predictions, which one is closer to the observations.
But it is important here to mention, that there are problems with this view. Similar to CRPS, there exist literature, which describe that even with its properties, measures like EMD are potentially to kind to false sharp predictions compared to uniformed ones. In the CRPS case, the distance is squared, so that a longer transport of probability is necessary for a wrong prediction. In my paper I also show the results with this approach as IQD. A squared distance is much less intuitive than a linear one, it is harder to understand for scientists, why they should use this above the others, which leads to hesitant use of these kind of measures. Therefore, it will be necessary in the future to much better describe why the issues occur and develop new pictures to explain everyone, why squaring is the way to go. We also need new ways in general for verification in the future, but on this I will write more on the final post in this series.
The fourth day of the IMSC 2019 was the day when the heat wave finally hit Toulouse with full force. Around 40 deg C was what the temperature measurements told us and it felt a bit overwhelming. The morning started again with plenary sessions and talks about uncertainty separation and down-scaling. Afterwards followed the poster session in a tent outside, and it got warmer and warmer over time as the wind was not as strong as in the last days.
After lunch the parallel sessions for the day started and I chose the one on forecast evaluation. The first part was reserved for the development of new verification procedures and I had my own talk in this section. It went alright, I presented two new skill scores basing on the EMD and demonstrated it at different seasonal prediction applications. The second half of the session was on the application of verification procedures and showed many different fields.
With the end of the talks it was time for the social events. The choice was either a wine tasting on a ship or a walking tour through town. I chose the latter one and it was a challenge to always find shade to get not too warm in the sunshine. Tomorrow will be the last day and the weather will still be warm enough to be a challenge.
Last week NCAR in Boulder (Colorado) hosted the second edition of the International Conference on Subseasonal to Decadal prediction. It covered the climate prediction from a few weeks up to a few years and hosted with around 350 scientists a good representation of the community in this field. During most of the days the conferences was split into a subseasonal to seasonal (S2S) and a seasonal to decadal (S2D) session.
The International Conference on S2S2D poster
I personally visited only the S2D part, as my current work focuses on this topic. The first day looked into the mechanisms of predictability and the typical candidates, like ocean, soil moisture and stratosphere, were discussed. The second day shifted then more to the modelling of these phenomena. The weather services presented their new prediction systems and new approaches to modelling were discussed. As a third topic covered the handling of the predictions. It looked at calibration and other technique to make the prediction really useful. This lead to the fourth topic, which discussed the decision-making process basing on the prediction. Here, the applications were the main focus points and many different phenomena and their predictability were shown. Topic number five looked at the statistical verification. It presented new approaches to access the skill of the models. The final session of the S2D session looked at the frontiers of earth system prediction and therein especially at the handling of carbon within the models. Afterwards in a combined session of both parts many different aspects on the future of research in this field were brought up. Among others the topics of temporal dependence of forecast skill and the so-called ‘signal to noise paradox’ lead to a lively discussion.
My personal contributions were threefold. I showed on a poster in the first session how the Summer NAO can be predicted using ensemble sub-sampling. In the second session I presented a poster on the view that sub-sampling can be viewed as a post processing procedure and can so explain why it works. The talk in the fifth session then covered the 2D categorical EMD score.
All in all it was a great conference, with many interesting discussions and a great overview over this interesting field. Certainly many impulses will come from this and will give not only my own research a new push.
Playing around with data can be quite funny and sometimes deliver some interesting results. I had done this a lot in the past, which was mainly a necessity coming from my PhD. Therein I had developed some methods for quality assurance of data, which needed of cause some interesting applications. So every time a nice dataset got to live, I had run them through my methods and usually the results were quite boring. Main reason for this is that these methods are designed to identify inhomogeneities and a lot of the published data nowadays is already quality controlled (homogenised), which makes it quite hard to identify new properties within the dataset. Especially model data is often quite smoothed so that it is necessary to look at quite old data to find something really interesting. Continue reading →