EGU 2017: Complications of interdisciplinarity

The third day of the EGU is over and my day got busier than yesterday. It started with a look into a sea-ice session with an interesting view of predicting its decline. A key is not to look at the time as the decisive variable, but on the development of greenhouse gases in the atmosphere. The second half of the first session I went into a more applied geological session, which mainly asked questions about how boulders get onshore. Quite interesting were the implications on potential storm climate during the last interglacial. The second session I paid a visit to precipitation its retrieval and the resulting products. Precipitation is one of the most complicated variables to predict as well as to measure and has therefore always interesting developments to offer.

After lunch my next stop was again a medal lecture, this time on chaos and the presenter had some really nice examples. The remaining session was on ENSO, before I decided to visit the open session on ocean science. Some interesting talks, for example on the uncertainty of deep ocean heat content made it an interesting session. The final of the day was as always the poster session.

Conferences like the EGU are always great for researchers like me, who prefer to take look into different fields (as I personally focus on the developments of statistical methodologies, which do not require to stick to one field). Unfortunately, this leads even more to the problem that you have to decide what you would like to see. While often schedulers take care to give a consistent schedule for one discipline (even when it does not really work every time),  having several different divisions to follow needs some extra care. When I look onto the first three days, I have visited sessions of the following divisions (only the first division on the list): OS, GM, G, CL, AS, GI, CR, NP and NH. I am not quite sure, which division I belong to myself, but I have learned that it would be simpler to stick to one division only. Often the computer systems/apps are not designed to assist in the search of session of many (or all) divisions and it requires some extra work to do it properly. There is always a session you felt you have missed. Anyway, it is worth the effort and everyone has problems to get their ideal scheduling done. The current app is a nice feature, but there is still the question on how it will get better to really assist every type of scientist at such a huge conference.

Massive ensemble paper background: What will the future bring?

In my final post on the background on the recently published paper, I would like to take a look into the future of this kind of research. Basically it highlights again what I have already written at different occasions, but putting it together in one post might make it more clear.

Palaeo-data on sea-level and its associated datasets are special in many regards. That is what I had written in my background post to the last paper and therefore several problems occur when these datasets are analysed. Therefore, as I have structured the problems into three fields within the paper I also like to do it here.

The datasets and their basic interpretation are the most dramatic point, where I expect the greatest steps forward in the next years. Some paper came out recently that highlight some problems, like the interpretation of coral datasets. We have to make steps forward to understand the combination of mixed datasets and this can only happen when future databases advance. This will be an interdisciplinary effort and so challenging for all involved.

The next field involved are the models. The analysis is currently done with simple models, which has its advantages and disadvantages. New developments are not expected immediately and so more the organisation of the development and sharing the results of the models will be a major issue in the imminent future. Also new ideas about the ice sheets and their simple modelling will be needed for similar approaches as we had used in this paper. Statistical modelling is fine up to a point, but there are shortcomings when it goes to the details.

The final field is the statistics. Handling sparse data with multidimensional, probably non-gaussian uncertainties has been shown as complicate. There needs to be new developments of statistical methodology, which are simple on the one side, so that every involved discipline can understand them, but also powerful enough to solve the problem. We tried in our paper the best to develop and use a new methodology to achieve that, but there are certainly different approaches possible. So creativity is needed to generate methodologies, which do not only deliver a value for the different interesting parameters, but also good and honest uncertainty estimates.

Only when these three fields develop further we can really expect to get forward with our insights into the sea-level of the last interglacial. It is not a development, which will happen quickly, but I am sure that the possible results are worth the efforts.

Massive ensemble paper background: What can we say now on the LIG sea-level?

After the new paper is out it is a good time to think about the current status on the main question it covered, the sea-level during the LIG. Usually I do not want to generalise too much in this field, as there is currently a lot going on, many papers are in preparation or have just been published and the paper we have just published was originally handed in one and a half years ago. Nevertheless, some comments on the current status might be of interest.

So the main question the most papers on this topic cover is: How high was the global mean sea-level during the last interglacial. There were some estimates in the past, but when you ask most people who work with this topic they will answer more than six metre  higher than today. That is of course an estimate with some uncertainty attached to it and currently most expect that it will not have been much higher than about nine metres than today. There are several reasons for this estimate, but at least we can say that we are quite sure that it was at least higher than present. From my understanding, geologists are quite certain that at least for some regions this is true and even when the data is sparse, meaning the number of data points low, it is very likely that this was also the case for the global mean. Whether it is 5, 6 or 10 metre higher is a more complicate question. It will still need more evaluation until we can make more certain statements.

Another question on this topic are the start point, end point and duration of the high stand. This question is very complex, as it depends on definitions and the problem that in many places only the highest point of sea-level over the duration of the LIG can be measured. That makes it very complex to say something definitive especially on the starting point. As such, our paper did not really made a statement on this, as it just shows that data from boreholes and from corals are currently not stating the same answer.

The last question everybody asks is the variability of the sea-level during the LIG. Was it just one big up and down or were there several phases with a glaciation phase in the middle. Or where there even more phases than two? Hard questions. The most reliable statements say that there are at least two phases, while from my perspective our paper shows that it is currently hard to make any statement basing on the data we used. But also here, new data might give us the chance to make better statements.

So there are still many questions to answer in this field and I hope the future, on which I will write in my last post on this topic, will bring many more insights into this field.

Massive ensemble paper background: Data assimilation with massive ensembles

Within the new paper we developed and modified a data assimilation scheme basing on simple models and up to a point Bayesian Statistics. In the last post I talked about the advantages and purposes of simple models and this time I would like to talk about their application.

As already talked about, we had a simple GIA model available, which was driven by a statistical ice sheet history creation process. From the literature, we had the guideline that the sea level over the past followed roughly the dO18 curve, but that high deviations from this in variation and values can be expected. As always in statistics there are several ways to perform a task, basing on different assumptions. To design a contrast to the existing literature, the focus was set to work with an ensemble based approach. Our main advantage here is that we get at the end individual realisations of the model run and can show individually how they perform compared to the observations.

The first step in this design process of the experiment is the question how to compare a model run to the observations. As there were several restrictions from the observational side (limited observations, large two-dimensional uncertainties etc.), we decided to combine Bayesian statistics with a sampling algorithm. The potential large number of outliers also required us to modify the classical Bayesian approach. As a consequence, we were able at that point to estimate for each realisation of a model run a probability.

In the following the experimental design was about a general strategy, how to create the different ensemble members so that they are not completely random. Even with the capability to be able to create a lot of runs, even realisations in the order of 10,000 runs are not sufficient to determine a result without a general strategy. This lead us to a modified form of a Sequential Importance Resampling Filter (SIRF). The SIRF uses a round base approach. In each round a number of model realisations are calculated (in our case 100) and afterwards evaluated. A predefined number of them (we used 10), the best performers of the round, are taken forward to the next and act as seeds for the new runs. As we wanted a time-wise determination of the sea-level, we chose the rounds in this dimension. Every couple of years (in important time phases like the LIG more often) a new round was started. In each the new ensembles branched from their seeds with anomaly time series for their future developments. Our setup required that we always calculate and evaluate full model runs. To prevent that very late observations drive our whole analysis, we restricted the number of observations taken into account for each round. All these procedures led to a system, where in every round, and with this at every time step of our analysis, the ensemble had the opportunity to choose new paths for the global ice sheets, deviating from the original dO18 curve.

As you can see above, there were many steps involved, which made the scheme quite complicate. It also demonstrate that standard statistics get to its limits here. Many assumptions are required, some simple and some tough ones, to generate a result. We tried to make these assumptions and our process as transparent as possible. As such, our individual realisations, basing on different model parameters and assumptions on the dO18 curve, show that it is hard to constrain the sea-level with the underlying datasets for the LIG. Of course we get a best ice-sheet history under our conditions, that is how our scheme is designed, but it is always important to evaluate whether the results we get out of our statistical analysis make sense (basically if assumptions hold). In our case we could say that there is a problem. It is hard to say whether it is the model, the observations or the statistics itself which make the largest bit of it, but the observations are the prime candidate. Reasons are shown in the paper together with much more information and discussions on the procedure and assumptions.

Massive ensemble paper background: Massive ensembles: How to make use of simple models?

The new paper on the LIG sea-level investigation with massive ensembles analyses simple models. In this post I want to talk a bit about their importance and how they can be used in scientific research.

Simple models are models with reduced complexity. In contrast to complex models their physics is simplified, they are more specified for a specific problem and their results are not necessarily directly comparable to the real world. They can have a smaller, easier to maintain code base, but also a simple model can grow in lines of codes fast. A simple model is defined depends on the processes it includes, not the mass of coding lines. Continue reading

Massive ensemble paper background: Sea-level in the LIG: What are the problems?

In the new paper on the LIG sea-level investigation with massive ensembles, I try to demonstrate how complicate it is to actually model the LIG sea level. This has many reasons and are certainly not unique to this specific problem, but more to paleoclimatology in general. So I like to highlight a few specifics, which I encountered in the preparation of this paper.

I had written before on the speciality of the palaeo-sea-level data in general. From the statistical point of view the available data are inhomogeneous due to different origins and basing on different measurement principles (e.g. analysis of data from corals or boreholes). Handling their two-dimensional uncertainty (time and value), which are usually also quite large, makes it complicate to apply standard statistical procedures. Much to many assume that at least one of the two dimensions is neglectable and when problems with non-normal uncertainty curves are added, it poses a real challenge. And it is near to certain that the dataset contains outliers. There is also no clear way to identify these, so whether any value of a datapoint is valid is unclear.  Finally, we have to accept that there is hardly any real check to find out, whether those outliers identified by your statistical method are just false measurements or show special features of the physical system. And of course, a huge problem is that the number of available data is very low, which makes it even harder to constrain the sea-level during a specific time point.

Another point of concern is the combination of two complex systems which only in combination give a comparable result to the observations. There are on the one side the ice sheets. It is hard to put physical constraints on their spatial and temporal development (especially in simple models). We tried it with assumptions on their connections to the current (or better those of the past few thousand years) ice sheets, but it is hard to tell how the ice sheets at that time have really looked. Of course there exist complex model studies on this, but what studies we created here need are consistent ice sheets in a relatively high temporal resolution (e.g. 500 years) over a very long time (so more than 200.000 years). And additionally to it we would like to have several possible implementation of it (so I used 39.000 runs the majority of them have unique ice sheet histories). That is at least a challenge, so that statistical ice sheet creation becomes a necessity.

The other complex model is the earth. It reacts on the ice sheets and to their whole history. So the combination of these two models (the statistical ice sheet model and the physical GIA model) is key to a successful experiment. We handle here simple models, which always have their benefits, but also their huge disadvantage. I will talk on that more in the next post on this topic. But in this case these models are special. At least in theory they are non-Markovian, which means that not only the last state and the changes to the systems since then play a role, but also those system states longer ago. Furthermore, also future states play a role, but they have a much smaller influence. This has a lot to do with the experimental setup, but puts constraints on what you can do with your analysis procedures. It also requires that you have to analyse a very long time of development, in our case the last 214.000 years, even when you are just interested what happen at around 130.000 years before present.

Another factor in this are the so-called delta 18O curves. We use them to create a first guess of the ice sheets, which we afterwards varied. Nevertheless, their connection to ice volume is complicate. It is still open whether their connection is stable over time or changes during interglacials compared to glacials. Simple assumptions that they are constant make it complicate to handle a first guess, as it can be quite far off.

This all poses challenges to the methodological and experimental design. Of course there are other constraints like asvailable computing time and storage, which require you to make choices. I will certainly talk about some of them in the post about massive ensemble data assimilation.

So what makes the LIG-sea-level so complicate? It is the complexity of the problem and the low amount of constraints, due to sparsity and uncertainty of data. This combination poses a huge challenge to everyone trying to bring light into this interesting research field. From the point of statistics, it is an interesting problem and a real test to any statistical data assimilation procedure available.

Background to “Estimating the sea level highstand during the last interglacial: a probabilistic massive ensemble approach”

This post is about the new paper, which got out recently. The title of the paper is “Estimating the sea level highstand during the last interglacial: a probabilistic massive ensemble approach” and was published in Geophysical Journal International. It is an output from the iGlass project I have worked for until last year.

The paper addresses the sea-level evolution over the last interglacial. For this we use a GIA model, which is in model terminology a simple model, and compare it with the help of massive ensembles and a new data assimilation scheme to observations. Apart from introducing and demonstrating the methodology, this papers addresses many problems of this topic. It is designed to offer a different view on the LIG sea-level and on many complications we have, to determine it and its uncertainties.

This post is an introduction to some other background posts, which will be published here in the next couple of weeks. The topics I would like to write on are:

  1. Sea-level in the LIG: What are the problems?
  2. Massive ensembles: How to make use of simple models?
  3. Data assimilation with massive ensembles
  4. What can we say now on the LIG sea-level?
  5. What will the future bring?

With these topics I hope to bring in some personal views on this topic and explain some basic points of this very complex paper. It is not a paper, which has official head line numbers, as it is more a description on problems and introduction of new methodologies. For getting reliable numbers out, we have to rethink the problem, and this will certainly be done in the future.