Data peer review paper background: Why quality is a dicisive information for data?

Using information on data quality is nothing new. A typical way to do it is by the uncertainty of the data, which gave different data points in many different data analysis methods something like a weighting. It is essential to know, which data points or parts can be believed and which are probably questionable. To help data reusers with this, a lot of datasets contain flags. They indicate when problems occurred during the measurement or when a quality control method raised doubts. Every scientist who analyses data has to look after this information, and is desperate to know whether they explain for him/her the reason, why for example some points do not fit into the analysis of the rest of the dataset.

By the institutionalised publication of data, the estimation of data quality gets to a new level. The reason behind this is that published data is not only used by the scientists, who are well aware of the specific field, but also by others. This interdisciplinary environment is a chance, but also a thread. The chances can be seen by new synergies, bringing new views to a field and even more the huge opportunities of new dataset combination. In opposite to this the risks are the possible misunderstandings and misinterpretations of datasets and the belief that published datasets are ideal. The risks can at best countered by a proper documentation of the datasets. Therefore is the aim of a data peer review to guarantee the technical quality (like readability) of the dataset a good documentation. This is even more important since the datasets itself should not be changed at all. Continue reading