Post-processing paper background: Why do we need verification with uncertain observations?

Data verification is one of the corner stones in geoscience. Without knowing whether a prediction has been correct, it is not possible to claim that we can predict anything at all. Most of the verification bases nowadays on the assumption that observations are perfect, often without the acknowledgement of any uncertainties. Standard tools like contingency tables and correlations (the latter often used in some form in long-term predictions) makes it hard to take them into account (even when possible e. g. by sampling strategies).

Another problem is that having uncertainties for observations to work with is often not an easy task. An example are reanalysis data, which have long been only provided in form of one realisation. This led to the problem that while predictions were often available as ensembles, the observations to compare to were not. There are techniques available to use aggregated data and validate statistics of them, but the verification of most classical variables is still often done with certain observations. Currently the field is changing. Reanalysis start to become available in form of ensembles, so in the future we need new tools making use of these developments.

But also on the philosophical side there is more need to look into verification with uncertain observations. We know that the real world is not deterministic, we know that our instruments are imperfect and we are sure that these uncertainties matter. Why do we train our students in creating and  measuring uncertainties, when we later on do not use them in our analysis? And yes, there is the issue that all observations are in their core models. We acknowledge that models are imperfect, otherwise we wouldn’t need ensembles for creating predictions. But why do we then not take care of the uncertainties due to the applications in those models when we create observations. Those models are certainly not much better (they are just applied on a different temporal and spatial scale. So we have to confront this issue in every step we take, we do that in data assimilation, so we have to do it in data verification as well.

Therefore, new developments in this field are essential. We need new tools to look into uncertain observations and make use of them. This paper is a small step into opening opportunities for future developments in this direction. It is certainly not a final solution and certainly not the first step. It is just another proposal of a tool to approach this challenge. We require in the future well understood and tested tools, which are applicable by the broader scientific community. How those might look like is currently open, also whether the tools presented here are of any wider use. In the paper I described two metrics, the EMD and the IQD, and developed a strategy to make verification tools with them. In the next post I will take a deeper look into the two metrics and shine a light on the opportunity they offer.

Observations and reanalayses: Our shaky reference

For everyone working on data analysis in climatological science, using references is essential. These references, representing some form of truth, is often the target, which models have to reach. Verification (or in non-meteorological science validation) methodologies evaluate the results against the references and dependent on the methodology deliver good results when the model is near to it, matches its variability or is close in other statistical parameters. The power of these references in these analysis and defining our knowledge about the world is immense and so it is essential that it really has something to do with things we see in front of our windows.

Last month Wendy Parker published a paper named “Reanalyses and Observations: What’s the Difference” and looked at the references from a more philosophical point of view. She listed four points, which critically looked at the connection between references and observations and in this post I would like to take a look at them.

Continue reading

Big Data – More risks than chances?

There is an elephant in the room, at every conference in nearly every discipline. The elephant is so extraordinary that everyone seems to want to watch and hype it. In all this trouble a lot of common sense seems to get lost and especially the little mice, who are creeping around the corners, overlooked.

The big topic is Big Data, the next big thing that will revolutionise society, at least when you believe the advertisements. The topic grew in the past few years into something really big, especially as the opportunities of this term are regularly demonstrated by social media companies. Funding agencies and governments have seen this and put Big Data at their top of their science agenda. A consequence are masses of scientist, sitting in conference sessions about Big Data and discussions vary between the question on what it is and how it can be used. Nevertheless, there are a lot of traps in this field, who might have serious consequences for science in general. Continue reading

The role of statistics in science

Traditionally within the different disciplines of earth science the scientists are divided into two groups: modelers and observationalists. In this view the modellers are those who do theory, possibly with pen and paper alone, and the observationalist go into the field and get dirty hands. That this view is a little bit outdated, won’t be anything new. In my opinion, it really started with the establishment of remote sensing that this division reunited (Yes, reunite, because in the old days, there were a lot of scientists who did everything). As I am a learned meteorologist, from my view it is quite common that this division is not really existent anymore. Both types of scientists sit in front of their computer, both are programming and both have to write papers with a lot of mathematical equations. In other fields, the division might be still more obvious (e.g. Geology), but for many its only the type of data someone is working with, which classify someone as observationalist or modeller. Continue reading

The sampling issue

Observations are generally a tricky thing. Not only are they a special kind of model, which tries to cover a sometimes very complicate laboratory experiment. Additionally they are also representing the truth, as far as we are able to measure it. As a consequence they play a really important part in science, but are in some fields hard to generate.

During the PALSEA2 meeting a question has come up in the context of the generation of paleo-climatic sea-level observations.

Assumed your ressources allow only two measurements, is it better when they be near towards each other or should they be far away.

In the heat of the discussion both sides were taken, but in the end the conclusion was the typical answer for such kind of questions: “it depends on what you want to measure”. Continue reading

Observations represent the truth, models…

In the last year during a larger meeting I had made a comment, which let a lot of attendees shake their head and others just smile. The statement was:

“Observations represent the truth, models the state of our understanding.”

Like I have said before, on the first sight it is of cause rubbish that observations have anything to do with the truth. Indeed, truth is a great word with many different meanings and implications. In the context above “truth” (which anyhow should always set between quotation marks) describes the possible best estimation of the real world by the current available technology in real case situations. When I personally write things up, I usually use a measurement operator to make this clear that observations are never able to describe the full reality. How much effort observers might put at it (and they usually do an amazing job), the real physical state of a physical system can only be approximated. Continue reading

Drawing a line between models and observations

In my last post I showed that observations are models as well.  But when this is the case, why do we distinguish between these two kinds of data the way we do? Why is everyone so keen on observations, when they are just another model output?

The reason can be found usually in their different structure. The amount of modelling, which is applied to an observation to still be called observation should usually be very basic. Coming from the atmospheric sciences myself, the border between the two worlds can often be drawn in the type of the data. Generally the observations in that field are point data, often in situ data, which are irregular in time and space. In contrast to this, model data is usually very regular and sometimes high-dimensional.

Continue reading