In the new paper on the LIG sea-level investigation with massive ensembles, I try to demonstrate how complicate it is to actually model the LIG sea level. This has many reasons and are certainly not unique to this specific problem, but more to paleoclimatology in general. So I like to highlight a few specifics, which I encountered in the preparation of this paper.

I had written before on the speciality of the palaeo-sea-level data in general. From the statistical point of view the available data are inhomogeneous due to different origins and basing on different measurement principles (e.g. analysis of data from corals or boreholes). Handling their two-dimensional uncertainty (time and value), which are usually also quite large, makes it complicate to apply standard statistical procedures. Much to many assume that at least one of the two dimensions is neglectable and when problems with non-normal uncertainty curves are added, it poses a real challenge. And it is near to certain that the dataset contains outliers. There is also no clear way to identify these, so whether any value of a datapoint is valid is unclear. Finally, we have to accept that there is hardly any real check to find out, whether those outliers identified by your statistical method are just false measurements or show special features of the physical system. And of course, a huge problem is that the number of available data is very low, which makes it even harder to constrain the sea-level during a specific time point.

Another point of concern is the combination of two complex systems which only in combination give a comparable result to the observations. There are on the one side the ice sheets. It is hard to put physical constraints on their spatial and temporal development (especially in simple models). We tried it with assumptions on their connections to the current (or better those of the past few thousand years) ice sheets, but it is hard to tell how the ice sheets at that time have really looked. Of course there exist complex model studies on this, but what studies we created here need are consistent ice sheets in a relatively high temporal resolution (e.g. 500 years) over a very long time (so more than 200.000 years). And additionally to it we would like to have several possible implementation of it (so I used 39.000 runs the majority of them have unique ice sheet histories). That is at least a challenge, so that statistical ice sheet creation becomes a necessity.

The other complex model is the earth. It reacts on the ice sheets and to their whole history. So the combination of these two models (the statistical ice sheet model and the physical GIA model) is key to a successful experiment. We handle here simple models, which always have their benefits, but also their huge disadvantage. I will talk on that more in the next post on this topic. But in this case these models are special. At least in theory they are non-Markovian, which means that not only the last state and the changes to the systems since then play a role, but also those system states longer ago. Furthermore, also future states play a role, but they have a much smaller influence. This has a lot to do with the experimental setup, but puts constraints on what you can do with your analysis procedures. It also requires that you have to analyse a very long time of development, in our case the last 214.000 years, even when you are just interested what happen at around 130.000 years before present.

Another factor in this are the so-called delta 18O curves. We use them to create a first guess of the ice sheets, which we afterwards varied. Nevertheless, their connection to ice volume is complicate. It is still open whether their connection is stable over time or changes during interglacials compared to glacials. Simple assumptions that they are constant make it complicate to handle a first guess, as it can be quite far off.

This all poses challenges to the methodological and experimental design. Of course there are other constraints like asvailable computing time and storage, which require you to make choices. I will certainly talk about some of them in the post about massive ensemble data assimilation.

So what makes the LIG-sea-level so complicate? It is the complexity of the problem and the low amount of constraints, due to sparsity and uncertainty of data. This combination poses a huge challenge to everyone trying to bring light into this interesting research field. From the point of statistics, it is an interesting problem and a real test to any statistical data assimilation procedure available.