To call it a new paper might be a little bit extragated, but its publication happend within the last year. Actually it was submitted around a year ago and published online in November, but the actual publication of the paper happend in April. The name is quite long, but tells you already a lot of its content:
I do not want to talk about the whole paper, as my personal contribution was tiny compared to the great work of the other authors. Anyway, I would like to write a little bit about my part in it and what the task was.
Ocean gliders are one of the relatively new tools, which currently revolutionise the oceanographic observation system. As such they are currently tested for many applications, in case of this article for microstructure measurements. My part therein started when the main work was already done. After all the measuring, processing and calculations two time series over nearly nine days were given to me with the simple question: “What can you tell us about them.” Of course there were ideas around what could be in it, but as I do statistics, it is my task to make statements waterproof.
As always, you have to get familiar with the data in the first place before you can investigate detailed questions. I did quality assurance science during my PhD and from this I have my standard tools to play around with data and to learn about it. One of these tools is the histogram test, which is a nice test on inhomegeneities within datasets. The first thing you find with it is that there are obvious cycles within the time series, so you ask the experts to give you the obvious and physical most probable cycles you might find therin. Of course you can also tell exactly, which cycles have to be in it, by performing a spectral analysis, but when you make decsisions on simplifying and clustering data, it is better you understand the physics behind it. After doing this it was obvious that there are two different parts of the dataset (with different statistical properties), which are on the first view quite unrelated. The information to look at the data in the logarithmic sense, was then the main driver for the upcoming analysis.
When you assume distributions of the data it is important to test them. Done this it was simple to show that the time series are indeed, apart from the extremes, log-normal distributed. Performing the histogram test again, now with the logarithmic data, showed still the regime shift as before and so it was now the interesting question, whether the two parts itself were also log-normal distributed. Using qq-plots it was simple to show that they were and that just the mean and standard deviation in the logarithmic sense have changed. So my part got to an end, it was written up as one section at the end of the paper and I was happy with it.
So why are such analyses important? Why bringing in additional statistics into such a paper, while it is already a solid one? Because these small simple analysis contribute to the overall understandings of the data. Knowing the distribution of data values and its changes over time helps in modelling them or understanding the physics. Giving people simple tools at hand to see inhomogeneities would also allow for real time testing the data and might open new ways of measureing them. And yes, it gives nice figures, which illustrate the reader that there is really something within the data that might need further exploration. Statistics is not all about the equations, sometimes the right visualisation is equally important. All in all it was a nice example, how domain experts and their methodologies and a simple statistical analysis give quick and solid results.
M.R. Palmer, G.R. Stephenson, M.E. Inall, C. Balfour, A. Düsterhus, J.A.M. Green (2015): Turbulence and mixing by internal waves in the Celtic Sea determined from ocean glider microstructure measurements. Journal of Marine Systems, 144, 57-69