# Massive ensemble paper background: Data assimilation with massive ensembles

Within the new paper we developed and modified a data assimilation scheme basing on simple models and up to a point Bayesian Statistics. In the last post I talked about the advantages and purposes of simple models and this time I would like to talk about their application.

As already talked about, we had a simple GIA model available, which was driven by a statistical ice sheet history creation process. From the literature, we had the guideline that the sea level over the past followed roughly the dO18 curve, but that high deviations from this in variation and values can be expected. As always in statistics there are several ways to perform a task, basing on different assumptions. To design a contrast to the existing literature, the focus was set to work with an ensemble based approach. Our main advantage here is that we get at the end individual realisations of the model run and can show individually how they perform compared to the observations.

The first step in this design process of the experiment is the question how to compare a model run to the observations. As there were several restrictions from the observational side (limited observations, large two-dimensional uncertainties etc.), we decided to combine Bayesian statistics with a sampling algorithm. The potential large number of outliers also required us to modify the classical Bayesian approach. As a consequence, we were able at that point to estimate for each realisation of a model run a probability.

In the following the experimental design was about a general strategy, how to create the different ensemble members so that they are not completely random. Even with the capability to be able to create a lot of runs, even realisations in the order of 10,000 runs are not sufficient to determine a result without a general strategy. This lead us to a modified form of a Sequential Importance Resampling Filter (SIRF). The SIRF uses a round base approach. In each round a number of model realisations are calculated (in our case 100) and afterwards evaluated. A predefined number of them (we used 10), the best performers of the round, are taken forward to the next and act as seeds for the new runs. As we wanted a time-wise determination of the sea-level, we chose the rounds in this dimension. Every couple of years (in important time phases like the LIG more often) a new round was started. In each the new ensembles branched from their seeds with anomaly time series for their future developments. Our setup required that we always calculate and evaluate full model runs. To prevent that very late observations drive our whole analysis, we restricted the number of observations taken into account for each round. All these procedures led to a system, where in every round, and with this at every time step of our analysis, the ensemble had the opportunity to choose new paths for the global ice sheets, deviating from the original dO18 curve.

As you can see above, there were many steps involved, which made the scheme quite complicate. It also demonstrate that standard statistics get to its limits here. Many assumptions are required, some simple and some tough ones, to generate a result. We tried to make these assumptions and our process as transparent as possible. As such, our individual realisations, basing on different model parameters and assumptions on the dO18 curve, show that it is hard to constrain the sea-level with the underlying datasets for the LIG. Of course we get a best ice-sheet history under our conditions, that is how our scheme is designed, but it is always important to evaluate whether the results we get out of our statistical analysis make sense (basically if assumptions hold). In our case we could say that there is a problem. It is hard to say whether it is the model, the observations or the statistics itself which make the largest bit of it, but the observations are the prime candidate. Reasons are shown in the paper together with much more information and discussions on the procedure and assumptions.

This site uses Akismet to reduce spam. Learn how your comment data is processed.