In the final post on this background-series I want to write about the necessity for new ideas in verification. Verification is essential in geo- and climate science, as it gives validity to our work of predicting the future, whether it is on the short or long timescale. Especially in long-term prediction we have the huge challenge to verify our predictions on a low number of cases. We are happy when we got our 30+ events to identify our skill, but we have to find ways to make quality statements on potentially much lower number of cases. When we e.g. investigate El Niño events over the satellite period, we might have a time series bellow 10 time steps at hand and come to a dead end with classical verification techniques. Contingency tables require much more cases, because otherwise potential uncertainties become so huge that they cannot be controlled. Correlation measures are also highly dependent on many cases. Everything below 30 is not really acceptable, which is shown by quite high thresholds to reach significance. Still, most of long term prediction evaluation rely on such methods.
An alternative idea has been proposed by DelSole and Tippett, which I had first seen at the S2S2D-Conference in 2018. In this case we do not investigate a whole time series at once, as we would do for correlations, but single events. This allows to evaluate the effect of every single time step on the verification and give therefore new information beside the information on the whole time series.
I have shown in the new paper, that this approach allows also a paradigm shift in evaluating forecasts. While we looked beforehand in many approaches at a situation, where the evaluation of a year depends on the evaluation on other years, by counting the successes of each single year makes a prediction evaluation much more valuable. We do often not ask how good a forecast is, but whether it is better than another forecast. And we want to know at the time of forecasting, how likely it is that a forecast is better than another. But this information is not given by many standard verification techniques, as they take into account the value of difference between two forecasts at each time step. This is certainly important information, but limits our view in essential questions of our evaluation. Theoretically, it is often possible, that one single year can decide whether one forecast is better than another. Or more extreme: When in correlation one forecast is really bad in one year, but is better in all other years, it can still be dominated by the other forecast. These consequences have to be taken into account when we verify our models with these techniques.
As such, it is important to collect new ideas about how we want to verify and quantify the quality with its uncertainties of the new challenges, which are posed to us. This new paper applies new approaches in many of these departments, but there is certainly quite some room for new ideas in this important field for the future.
Data verification is one of the corner stones in geoscience. Without knowing whether a prediction has been correct, it is not possible to claim that we can predict anything at all. Most of the verification bases nowadays on the assumption that observations are perfect, often without the acknowledgement of any uncertainties. Standard tools like contingency tables and correlations (the latter often used in some form in long-term predictions) makes it hard to take them into account (even when possible e. g. by sampling strategies).
Another problem is that having uncertainties for observations to work with is often not an easy task. An example are reanalysis data, which have long been only provided in form of one realisation. This led to the problem that while predictions were often available as ensembles, the observations to compare to were not. There are techniques available to use aggregated data and validate statistics of them, but the verification of most classical variables is still often done with certain observations. Currently the field is changing. Reanalysis start to become available in form of ensembles, so in the future we need new tools making use of these developments.
But also on the philosophical side there is more need to look into verification with uncertain observations. We know that the real world is not deterministic, we know that our instruments are imperfect and we are sure that these uncertainties matter. Why do we train our students in creating and measuring uncertainties, when we later on do not use them in our analysis? And yes, there is the issue that all observations are in their core models. We acknowledge that models are imperfect, otherwise we wouldn’t need ensembles for creating predictions. But why do we then not take care of the uncertainties due to the applications in those models when we create observations. Those models are certainly not much better (they are just applied on a different temporal and spatial scale. So we have to confront this issue in every step we take, we do that in data assimilation, so we have to do it in data verification as well.
Therefore, new developments in this field are essential. We need new tools to look into uncertain observations and make use of them. This paper is a small step into opening opportunities for future developments in this direction. It is certainly not a final solution and certainly not the first step. It is just another proposal of a tool to approach this challenge. We require in the future well understood and tested tools, which are applicable by the broader scientific community. How those might look like is currently open, also whether the tools presented here are of any wider use. In the paper I described two metrics, the EMD and the IQD, and developed a strategy to make verification tools with them. In the next post I will take a deeper look into the two metrics and shine a light on the opportunity they offer.
Recently my paper “Seasonal statistical-dynamical prediction of the North Atlantic Oscillation by probabilistic post-processing and its evaluation” was accepted in “Nonlinear Processes in Geophysics” and as it is now published, I will use this blog as it is a tradition (see here, here and here) to explain in more detail what it is about and what the problems are. So I will highlight some background to the paper in the upcoming week and will show why some of the points I raise therein will be important for the development of the community around seasonal- and decadal prediction as well as the one for the wider climate science.
So my background stories will take a look at the following topics:
- What is sub-sampling?
- Why does sub-sampling work?
- Why do we need verification with uncertain observations?
- EMD and IQD? What is it about?
- Do we need new approaches in verification?
As always, these topics are of course just an addition to the regular paper and are all just personal view. As it is my first (and I somehow hope only) single-author paper, the manuscript reflects of course mostly my view. Anyway, there are limits of things you can do in scientific literature, and that’s what those blog posts are about. And of course, it is statistical literature, so explaining it in more detail for those who are no fans of equations, is certainly a surplus.