The paper “Automated quality evaluation for a more effective data peer review“, which was published by me and my co-author in the Data Science Journal this week started as a common background theme for my PhD thesis. The task was to find a way to bring the loose chapters on quality tests together.
The basic idea was to take a closer look at the publication process in general and, since it was the topic of the project at that time, how it can be applied to data. This approach led to a lot of questions, especially on how scientist work, how they interact by their publications and how they should work. The latter is quite philosophical and was in part addressed in Quadt et al (2012).
In the upcoming week I want to give some insights into the general topic of the paper and how it tries to address the arisen problems. The topics are:
- The philosophical problematic of a missing data peer review
- How a data peer review could look like
- Statistical quality evaluation? What is that?
- Why quality is a decisive information for data
- Chances for the future
I hope these topics will show a little bit what is behind this paper and how it fits into the scientific landscape.
To really fully understand that paper it has to be brought into connection with Quadt et al 2012. In this paper we showed, that traditional publications and data publications can be published in a comparable way, but that for this one major element is missing: data peer review.
The current paper should show that there are possible ways to create an effective data peer review. It creates a plan on how data could be effectively analysed, the results evaluated and how all this might be integrated into data centres. Nobody should think that I assume this to happen in the current decade, since everybody should be aware of the huge complications to analyse and check datasets in these locations. It is the aim of this paper to show that when the right steps are taken the scientific community might profit enormously from the introduction of a proper data peer review. But how might it work?
A major element in my idea for this is a technical approach to it. When enough assistance would be available for authors and reviewers it might be possible that review times, like the ones currently needed for paper peer reviews, might be achievable. A way to do this, by statistical quality evaluation, is shown in this paper. From the mathematical point of view it is relatively simple, which is also a necessity for such a scheme. Interdisciplinary science require simple solutions, especially when trust is an issue. Transparency is a main factor for building up trust and that is why this way was chosen. Nevertheless, this approach offers many options for future enhancement, like a theoretically possible full automatic review of datasets at the hand of experiences with the scheme.