The environment for the publication of data is currently changing rapidly. New data journals emerge, like Scientific Data from Nature two weeks ago or Geoscience Data Journal by Wiley. The latter was also in the focus of the PREPARDE project, which delivered a nice paper on data peer review a couple of weeks ago (Mayernik et al, 2014). Furthermore, more and more funding agency require the publication of data and it is to expect that this demand will lead to more pressure for scientists to make their work publicly available.
These developments are great, but at this point I would like to think further into the future. Where should we be in five or ten years, and what is possible in let’s say 30 or more years. A lot is the answer, but let’s go a little bit more in the details. Continue reading
Using information on data quality is nothing new. A typical way to do it is by the uncertainty of the data, which gave different data points in many different data analysis methods something like a weighting. It is essential to know, which data points or parts can be believed and which are probably questionable. To help data reusers with this, a lot of datasets contain flags. They indicate when problems occurred during the measurement or when a quality control method raised doubts. Every scientist who analyses data has to look after this information, and is desperate to know whether they explain for him/her the reason, why for example some points do not fit into the analysis of the rest of the dataset.
By the institutionalised publication of data, the estimation of data quality gets to a new level. The reason behind this is that published data is not only used by the scientists, who are well aware of the specific field, but also by others. This interdisciplinary environment is a chance, but also a thread. The chances can be seen by new synergies, bringing new views to a field and even more the huge opportunities of new dataset combination. In opposite to this the risks are the possible misunderstandings and misinterpretations of datasets and the belief that published datasets are ideal. The risks can at best countered by a proper documentation of the datasets. Therefore is the aim of a data peer review to guarantee the technical quality (like readability) of the dataset a good documentation. This is even more important since the datasets itself should not be changed at all. Continue reading
A basic point of the new paper is the introduction of quality evaluation. But what does this mean and why do I think it is important? Well, for the first question I have to talk a little bit about the background. The common words we use together with quality are assurance and control. Depending on their definition, they are focussing to make the product or the processes, which lead to the product, better. Since the products we are talking about is data, both are focussing to deliver better datasets.
Nevertheless, in peer review we are handling now a different stage, since we are now in the phase, in which we want to quantify the quality. To do this, some points have to be made clear. First is the fact that quality is subjective. Especially, when we think about the peer review process, it is important to keep in mind that this is not an objective process. The quality of the publication entity is defined by the opinion of the reviewers and editor and has therefore inevitably a personal touch. Of cause the same is true for data peer review. Continue reading
After talking about the place for data publications among the other the scientific publication types, I want to give an overview on how a data publication might look like in the future. As I have stated before, to gain trust in a data peer review it should be comparable to the ones from other forms. The simplest way to achieve this is to build it up as similar as possible to this, but include changes which are necessary due to the form of the publication entity. Continue reading
In philosophy, several great minds have addressed the way scientist should work to gain their knowledge. Among others Bacon (1620) and Popper (1934) showed different ways to gain information and how it can be evaluated to become science. During my PhD I developed a relatively simple and general working scheme for scientists, which was published in Quadt et al (2012). The paper analysed the way how this general scientific working scheme could be represented by scientific publications.
The way scietists should work (Quadt et al 2012)
While the traditional journal paper, which exists since the Philosophical Transactions of the Royal Society, edited by Henry Oldenburg in 1665, covers the whole scientific process, new forms have emerged in the last decade. Data papers (Pfeiffenberger & Carlson, 2011), a short journal article focussing on the experimental design and present the data from the experiment, filled a gap and should simplify the use of data. Another process is the publication of data and metadata at a data centre itself, without an accompanying journal article.
This type of publication was part of my project at that time. A general question therein was how such a publication can be made comparable to the other types. The comparison showed that it is quite comparable, but that one important element is missing: peer review. Continue reading
The paper “Automated quality evaluation for a more effective data peer review“, which was published by me and my co-author in the Data Science Journal this week started as a common background theme for my PhD thesis. The task was to find a way to bring the loose chapters on quality tests together.
The basic idea was to take a closer look at the publication process in general and, since it was the topic of the project at that time, how it can be applied to data. This approach led to a lot of questions, especially on how scientist work, how they interact by their publications and how they should work. The latter is quite philosophical and was in part addressed in Quadt et al (2012).
In the upcoming week I want to give some insights into the general topic of the paper and how it tries to address the arisen problems. The topics are:
- The philosophical problematic of a missing data peer review
- How a data peer review could look like
- Statistical quality evaluation? What is that?
- Why quality is a decisive information for data
- Chances for the future
I hope these topics will show a little bit what is behind this paper and how it fits into the scientific landscape.
To really fully understand that paper it has to be brought into connection with Quadt et al 2012. In this paper we showed, that traditional publications and data publications can be published in a comparable way, but that for this one major element is missing: data peer review.