In philosophy, several great minds have addressed the way scientist should work to gain their knowledge. Among others Bacon (1620) and Popper (1934) showed different ways to gain information and how it can be evaluated to become science. During my PhD I developed a relatively simple and general working scheme for scientists, which was published in Quadt et al (2012). The paper analysed the way how this general scientific working scheme could be represented by scientific publications.
While the traditional journal paper, which exists since the Philosophical Transactions of the Royal Society, edited by Henry Oldenburg in 1665, covers the whole scientific process, new forms have emerged in the last decade. Data papers (Pfeiffenberger & Carlson, 2011), a short journal article focussing on the experimental design and present the data from the experiment, filled a gap and should simplify the use of data. Another process is the publication of data and metadata at a data centre itself, without an accompanying journal article.
This type of publication was part of my project at that time. A general question therein was how such a publication can be made comparable to the other types. The comparison showed that it is quite comparable, but that one important element is missing: peer review.
It is important to state that I mean with peer review that it is comparable to the one applied to traditional paper publications. Nevertheless, this will be part of another post.
The missing element brings some philosophical problems. A reason for this is that nowadays everything have to be peer-reviewed to be accepted as science. One example of such a problem was also stated by the reviews of the IPCC reports: Why do we require that the information on which the report bases have to be peer-reviewed, while the datasets on which the literature bases is not peer-reviewed itself?
There are many reasons for and against data peer review and I would like to talk about the most common ones. A typical point, which people make on this topic when I talk to them is that it is simply impossible to do a proper data peer review. With this argument in the back they usually ignore the problem or classify it as not important. Another argument is that we do not need a data peer review at all. This is generally a valid statement, when we look into the history of scientific publications. Peer review was not always around or so well established like today.
There are many problems to define when it really started, whether the reviews of the first journals were already a peer review in the sense of how we define it today. And there are even well established journals, who had not even a peer review system how we expect it today until the beginning of the 20th century. All the literature I read during my PhD on this topic (yes, it was apart from the data handling journals mostly medical literature), stated that from the 1950s it can be assumed that peer review was the norm for a paper to be accepted as scientific. We all know that science existed well before that time and therefore one can argue that data peer review is not necessary to make datasets part of science.
Additionally, the papers written on the data are something as a review, which let some describe datasets used in peer-reviewed literature as peer-reviewed itself. That this might be critical when we look at the average time used by reviewers on a paper is self-explaining. This is different in data papers, where data is clearly the main topic. Nevertheless, also there the time might be a critical factor when additional (short) journal paper have to be reviewed.
The last argument which I want to discuss here is the one, which was addressed specifically by the new paper. Some acknowledge the theoretical possibility of a data peer review, but state that it is not possible to achieve it in a reasonable time. Of cause it is hard to imagine how to do it, since most people have a simple picture in their mind: You get a GB of data and next week you should deliver a report, whether the dataset is in your view error-free. And since in journal articles every word have to be correct this have to be true for every bit of data.
My approach tries to simplify this by using technical measures, basing on statistics. Nevertheless, there are more problems to be solved than this to get really data, which deserves to be classified as peer-reviewed. My goal was to show that it is possible in a reasonable time to create a proper report on data, which might be acceptable as a review report. Whether this or a similar scheme should be implemented is a science theoretical problem, which have to be further explored and decided in and by the different scientific communities.