Data verification is one of the corner stones in geoscience. Without knowing whether a prediction has been correct, it is not possible to claim that we can predict anything at all. Most of the verification bases nowadays on the assumption that observations are perfect, often without the acknowledgement of any uncertainties. Standard tools like contingency tables and correlations (the latter often used in some form in long-term predictions) makes it hard to take them into account (even when possible e. g. by sampling strategies).
Another problem is that having uncertainties for observations to work with is often not an easy task. An example are reanalysis data, which have long been only provided in form of one realisation. This led to the problem that while predictions were often available as ensembles, the observations to compare to were not. There are techniques available to use aggregated data and validate statistics of them, but the verification of most classical variables is still often done with certain observations. Currently the field is changing. Reanalysis start to become available in form of ensembles, so in the future we need new tools making use of these developments.
But also on the philosophical side there is more need to look into verification with uncertain observations. We know that the real world is not deterministic, we know that our instruments are imperfect and we are sure that these uncertainties matter. Why do we train our students in creating and measuring uncertainties, when we later on do not use them in our analysis? And yes, there is the issue that all observations are in their core models. We acknowledge that models are imperfect, otherwise we wouldn’t need ensembles for creating predictions. But why do we then not take care of the uncertainties due to the applications in those models when we create observations. Those models are certainly not much better (they are just applied on a different temporal and spatial scale. So we have to confront this issue in every step we take, we do that in data assimilation, so we have to do it in data verification as well.
Therefore, new developments in this field are essential. We need new tools to look into uncertain observations and make use of them. This paper is a small step into opening opportunities for future developments in this direction. It is certainly not a final solution and certainly not the first step. It is just another proposal of a tool to approach this challenge. We require in the future well understood and tested tools, which are applicable by the broader scientific community. How those might look like is currently open, also whether the tools presented here are of any wider use. In the paper I described two metrics, the EMD and the IQD, and developed a strategy to make verification tools with them. In the next post I will take a deeper look into the two metrics and shine a light on the opportunity they offer.
The year is coming to an end and as so many others I think it is the right time to look back on what happened in the past year. For me personally it was a year of transition, which is a natural step, when you change your affiliation. So I moved from the UK back to Germany, which has a different science system, different way of handling and honouring science in society. Additionally, I changed my main research topic once again and so I got many things to learn.
The past three years were about sea-level science, mainly in the palaeoclimatic environment. I really like that topic and I am happy that there are still some things to do for me in this field. My new topic is now more about the future, seasonal prediction, and has also its beautiful spots. Both topics got something in common. Of cause it is statistics and the development of new methodologies as this is my main research focus. But it is also the relevance for the society, the impact, which makes both topics very attractive ones. Nevertheless, the differences are more than just the covered time span and the physical system at hand. It also covers two completely different ways of modelling. My sea-level research covered mainly simple modelling appraoches, with a (very) large number of ensembles. In seasonal prediction the biggest available models in Earth science, which needs a huge amount of computer capacity are used and as such a low number of ensembles can be produced.
But as a scientists of cause it is important what you produce some output. Well, basing on the usual statistics its not much: one conference, no publication. Sounds really sad. Well, looking deeper into it, it is not this bad. There are still two papers in review, some in preperation and I visited several (project) meetings, a summer school and yes some job interviews. Apart from that I am involved again in some teaching, which allows me to learn much and makes quite some fun. So the upcomming year will have to be more productive in terms of output, but I am optimistic that this will work out.
Therefore, I wish everybody a great start into the new year and perhaps some nice christmas projects coming to fruition and will bring a great start of the year.
In the last year during a larger meeting I had made a comment, which let a lot of attendees shake their head and others just smile. The statement was:
“Observations represent the truth, models the state of our understanding.”
Like I have said before, on the first sight it is of cause rubbish that observations have anything to do with the truth. Indeed, truth is a great word with many different meanings and implications. In the context above “truth” (which anyhow should always set between quotation marks) describes the possible best estimation of the real world by the current available technology in real case situations. When I personally write things up, I usually use a measurement operator to make this clear that observations are never able to describe the full reality. How much effort observers might put at it (and they usually do an amazing job), the real physical state of a physical system can only be approximated. Continue reading
In my last post I showed that observations are models as well. But when this is the case, why do we distinguish between these two kinds of data the way we do? Why is everyone so keen on observations, when they are just another model output?
The reason can be found usually in their different structure. The amount of modelling, which is applied to an observation to still be called observation should usually be very basic. Coming from the atmospheric sciences myself, the border between the two worlds can often be drawn in the type of the data. Generally the observations in that field are point data, often in situ data, which are irregular in time and space. In contrast to this, model data is usually very regular and sometimes high-dimensional.
Doing statistics between the two worlds of observations and model results lead often to the assumption that both are completely different things. There are the observations, where real people moved into the field, drilled, dug and measured and delivered the pure truth of the world we want to describe. In contrast to this, the clean laboratory of a computer, which takes all our knowledge and creates a virtual world. This world need not necessary have something to do with its real counterpart, but at least it delivers us nice information and visualisation. But this contrast between the dirty observations and the clean models is usually only something, which exists in our heads, in reality they are much more connected to each other.