Playing around with data can be quite funny and sometimes deliver some interesting results. I had done this a lot in the past, which was mainly a necessity coming from my PhD. Therein I had developed some methods for quality assurance of data, which needed of cause some interesting applications. So every time a nice dataset got to live, I had run them through my methods and usually the results were quite boring. Main reason for this is that these methods are designed to identify inhomogeneities and a lot of the published data nowadays is already quality controlled (homogenised), which makes it quite hard to identify new properties within the dataset. Especially model data is often quite smoothed so that it is necessary to look at quite old data to find something really interesting.
So I took the same dataset and played around with it. Interesting for the new version is that the dataset has now 100 different ensemble member, which deliver an extra dimension. I took the same dataset as Aslak, so the global mean monthly values.
The only exception is, that I did not take the 12 year running mean, as my methods are capable of handling multidimensional datasets at once. The used method is the histogram test (Düsterhus & Hense, 2012), with the Earth Mover’s Distance (EMD) measure with an multidimensional enhancement.
So let’s see what the HadCRUT 4.2 data shows us. I start with the annual data.
The result you see is a difference matrix, which means that each column/row stands for the associated year and each entry describes the difference between the years. What we see are typical patterns, which are partly effected by the chosen colour scale. So we see four larger blocks: 1850 to around 1935, around 1935 to 1978, 1978 to 1998 and 1998 up to today. For each the usual explanations can be given as the dataset and the predecessors is widely discussed in the literature. Interesting for historians are the visibility of the FGGE program in the mid 1970s.
Of cause we can do more with the dataset and the method than just showing the annual values. For example we can look how the ensembles have changed. For this I have subtracted the annual ensemble mean value.
As expected the values before 1880 and after are differentiate. This is probably due to the lack of data in this timeframe. The reason for this behaviour can have two sources. First the ensemble member do not agree or second the values are really different in that time span compared to the later one. Probably its a mix of both, but I do not want to get too much into the details at this place.
The third and last view on the data is a view on the ensemble members. Are they consistently different towards each other or has the number of the ensemble have an effect on the results?
Well, not really. There is no single ensemble member, which can be identified to be fundamentally different compared to the others, while some are more different to the most than the others. Patterns from the ensemble number are not really identifiable, only in the high 50s some successive members tend to disagree with the other members.
All in all it’s an interesting dataset and different methods to look at it will identify different things. Therefore it’s always fun to give it a try and play with it. In terms of multidimensional dataset the HadCRUT dataset is certainly one of the more interesting ones.