Post-processing paper background: Why does sub-sampling work?

One main aspect of the new paper is the question why sub-sampling works. In many review rounds for the original paper (Dobrynin et al 2018) we got questions about a proper statistical model of the method and many claims why it should not work while it does (aka cheating). This is the point this manuscript comes into play. Instead of selecting a (probably) random number of ensemble members close to one or more predictors everything is transferred to distribution functions (pdf). Of course those are not easily available without making large amounts of assumptions, so I have gone the hard way. Bootstrapping of EOF fields is certainly no easy task in terms of computational costs, but it does work. It allows to have for every ensemble member and every predictor as well as for the observations of the North Atlantic Oscillation (NAO) a pdf.

Basing on those pdfs it is now possible to look for the reason of better prediction skill of the sub-sampling method compared of no-sub-sampling-case. First step is to show that the distribution view and the sub-sampling are at least similar. In the end, making use of pdfs is not a pure selection but more a weighting. It weights those ensemble members higher, which are close to a predictor compared to those far away. Of course there are differences between the two approaches, but the results are remarkably similar. It gave us more confidence that in the many tests we did in the past on the sub-sampling methodology the way how we select does not have such a huge influence (but that will be explained in detail in an upcoming paper). Consequently, we can accept that when we can show how the pdf-approach works we will get insights into the sub-sampling approach itself.

The new paper shows, that key to the understanding of the mechanism is the understanding of the spread. While seasonal prediction has an acceptable correlation skill for its mean of ensemble members, each prediction of a single ensemble member is rubbish. In consequence, the overall ensemble has a huge spread of quite uniformed members. We have learned in the past to work with such problems, requiring us to take huge care in how to evaluate predictions on the long-term timescale. By filtering this broad spread and with it highly variant distribution function with informed and sharper predictor functions leads to the effect of sharpening the combined prediction, while at the same time having a better prediction overall. With other (simplified) words: we weight down the influence of those ensemble members that drifted away from the correct path and concentrate onto those, which are consistent with the overall state of the climate system.

As a consequence, the nature of the resulting prediction is in its properties quite similar to a statistical prediction, but has still many advantages of a dynamical prediction. It is probably not the best of both worlds, but an acceptable compromise. But to establish that we need tools to evaluate the made predictions and that proved to be harder than expected. But that is the story of the next post on why we need verification tools for uncertain observations.

IMSC 2019: The final day

With a half day of talks the IMSC 2019 ended today in Toulouse. It was again a quite warm day and not far away the record for the recorded french temperature was broken today. So it was fitting that the final day started with event attribution talks and covered among others heat waves and their attribution to climate change. The next session was the final parallel session and I stayed in the event attribution session. It addressed more events and discussed the limits of these techniques.

After lunch the conference officially ended at it was time to look back at the past days here in southern France. The conference was well organised and fulfilled the expectations. Good food, good location, interesting scientific content. The main topics were extremes and detection and attribution. It blocked quite a big chunk of the conference and pushed the other topics for my taste a bit too far into a corner. Biggest issue in the verification and forecast evaluation was the handling of uncertain observations. Apart from that the conference covered good statistical practice, some talks about data and many good discussions about statistical topics. So it was fun to join this conference again, even with the very hot weather. So let’s see where the enxt conference will be, in three or something years.

IMSC 2019: Halftime and the heat wave is coming

The third day is over at the IMSC 2019 in Toulouse and it was a day full of presentations. The first session was on the interaction of humans with climate and Space-time statistics. Main topics where the construction of indices to communicate severeness of climate related hazards to the experts and public and the transition of weather regimes. It was followed by talks about Detection and Attribution, another main topic at this conference. A interesting topic was the influence of the view of a scientist on the statistical results, which highlighted the subjectivity of statistics.

After lunch, which happened to be inside for the first time this week, due to the weather outside (I assume more thanks to the windy conditions, rather than the hot temperatures), the conference went again into parallel sessions. I switched between the space-time statistics sessions and long-term D&A, which covered among others the complex uncertainty structures of regression models and the classification of weather regimes.

Tomorrow will be a challenging day. So we expect something around 40 degrees and there will be a poster session and I will give my own talk. So it will certainly be exciting. Oh and not to forget, some outside activity is planned in the evening, in this heat certainly a special experience.

IMSC 2019: First poster session

The second day at IMSC 2019 in Toulouse and we started the day with three plenary talks on big data and extreme value analysis. Interesting topics around the reconstruction of observational data and how to properly do event attribution. After these talk an well filled poster session took place. Interesting posters on the topics of the last two days allowed many discussions on and beside the poster themes.

Capitol by night

Late ending of the dinner leads to nice views of the city

After lunch the next topic was changes of extremes. Again three topics showed how complex the estimation of extremes are in a changing world. Afterwards we split again in the minor rooms for the parallel session, where I attended the space-time statistics one. At this the topic was laid on emulators and the determination of significance.

With the end and a little break the final happening of the day was the conference dinner. Good french food and drinks allowed many interesting discussions. With this a long day ended and from tomorrow on we all expect the start of the heat wave.

IMSC 2019: Here we go Toulouse!

It is my second time at the International Meeting on Statistical Climatology (IMSC) after I have been on the previous conference in Canmore, Canada. This time it is hosted by Meteo France in Toulouse in Southern France. The main topic of the week will not necessary be the statistics, but mainly the weather, with the little heat wave which is announced for later this week.

But until we come to that we start with statistics. After collecting the badges, the conference kicked of with a look at Homogenisation and Machine Learning. Especially the latter will be most likely a big topic of this conference. Due to its increased visibility in its application in data science also the climate community gives Machine Learning a go and apply it to some applications.

After the plenary session on those topics it switched to three parallel sessions, which will be also the pattern for the rest of the week. I chose the changes in extremes session and went on to some homogenisation talks in the latter part. For the final session I visited the big data session, which showed several statistical approaches to look at larger datasets in different forms. The day ended with the obligatory ice breaker.

We will see how the topics will evolve over the week. Today I was surprised to hear topological approaches in several talks. In connection with Machine Learning some obviously see it as a big thing. And yes, since we are in France, I am looking forward for great food. Today was already a good start, finger food in many different versions was on offer during the session breaks.