The sampling issue

Observations are generally a tricky thing. Not only are they a special kind of model, which tries to cover a sometimes very complicate laboratory experiment. Additionally they are also representing the truth, as far as we are able to measure it. As a consequence they play a really important part in science, but are in some fields hard to generate.

During the PALSEA2 meeting a question has come up in the context of the generation of paleo-climatic sea-level observations.

Assumed your ressources allow only two measurements, is it better when they be near towards each other or should they be far away.

In the heat of the discussion both sides were taken, but in the end the conclusion was the typical answer for such kind of questions: “it depends on what you want to measure”.My personal answer to this is that it depends on how the observer (or a modeller) estimates the local and temporal representation of the measurement. When it is assumed that both observations measure the same physical system, then it is of cause a good idea to place the second observation further away, as you want to learn something on the regional or perhaps global variability. In contrast to this stands the case when you do not know, whether the two observations are able to measure the same system. This question usually comes up when the kind of observations in a field are relatively new and the methodologies not yet well established. In this context it is of cause useful to have two observations relatively near towards each other perhaps even at the same site.  The latter minimises problems, which occurs always when you want to compare different sites. But why it is important to know the variability on different scales of the physical system we want to measure? Well, this variability, in some fields also called internal variability, is an important base line for the estimation of uncertainties. When we are not able to quantify it, we can not extrapolate the results on a broader scale.

Let’s illustrate this with an example.As the topic was sea-level observations, we assume for the following that we have a measurement of sea-level at a given location. I will ignore the temporal component for the moment, as the argument can be made in an analogue way. So we, as observers, have decided to make an observation at one point of the coast, because we think that the conditions at this place fulfil our expectations to gain a good measurement point. In a next step we have to decide how representative our observation is. Is the measurement result the same around the globe? In the same region? At the next bay? Or even at the same point? When we do not know the answer to this question, it is better to establish this relationship, as a proper analysis of measurements with a large distance towards each other require some form of assumption on the relationships between these points. Too much simplifications therein lead to problematic results.

And of cause observations can vary even when they are repeated at the same location. There are not only the statistical uncertainties, which we have to take into account anyway. No, there could be changes in the temporal domain, the measurement technique or other forms of bias. A proper evaluation of these influences lead to fields like homogenisation and potential correction factors to make observations more comparable towards each other.

All in all, the quantification and importance of the sampling uncertainty is often neglected. Especially when the resources are limited it is hard to justify observations right next to each other, because large-scale science is often prefered. And that is in general alright, as people who uses the data have to trust the observers that they make the right decisions and are aware of these problems (which is the case in most fields). But everyone has to be aware that the two different scenarios tries to answer two different questions of science and both are necessary to answer, when the whole physical system wants to be understood. As a consequence, more resources are necessary, to increase the number of observations to quantify not only the large-scale, but also the small-scale variability of the system.

2 thoughts on “The sampling issue

  1. You already mentioned homogenization, the removal of non-climatic changes from climatic time series. For homogenization it is always important to have nearby measurements (not just in the beginning phase when one wants to understand the measurement), the non-climatic changes are detected by comparison with neighbours. Thus nearby, well-correlated neighbours are important.

    If we go beyond just two samples, I would prefer to take measurements in clusters (of at least 3, but maybe some more) stations. These clusters can then be spread again over the entire area of interest. This also makes it easier to estimate the spatial structure over a large range of scales. Unfortunately, people tend to spread their samples uniformly over the area of interest.

  2. Having much more observational points is certaintly prefered. Nevertheless, in the case of climate stations we have the problem of a change in the physical conditions. I mean not global warming, but changing instruments or changing measurment conditions, which influence the laboratory itself. From the statistical point of view it is very critical in temperature measurments to say, where you want to have the two stations, because the question how representative one measurement actually is, is definetely not a simple one. And answering it without a lot of measurements is even more complicate.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.