Database paper background: What does ATTAC^3 mean for scientific data handling?

The new paper proposes an ATTAC^3 sheme as guidance of data handling, which will also be of interest for other fields. To be open, collecting these points in this way was not my idea, but emerged during the writing process from one of the co-authors. I think it is a good basis for handling research data in general so I will explain it a bit from my personal view in the following.

Continue reading


Are data scientists “research parasites”?

Last week the two scientists Dan L. Longo and Jeffrey M. Drazen, both from medicine, published an editorial, which was also picked up by some news outlets. In this they talk about an emerging class of scientists, who are not anymore involved in the basic data collection, but reinterpret the results of original studies by their own methodologies and by combining several studies into a new dataset. They state that some scientists name this kind of researchers “research parasites” and claim that they are a problem for science overall. The authors prefer a system, where the scientists work directly together and publish their studies as coauthors.

Continue reading

Big Data – More risks than chances?

There is an elephant in the room, at every conference in nearly every discipline. The elephant is so extraordinary that everyone seems to want to watch and hype it. In all this trouble a lot of common sense seems to get lost and especially the little mice, who are creeping around the corners, overlooked.

The big topic is Big Data, the next big thing that will revolutionise society, at least when you believe the advertisements. The topic grew in the past few years into something really big, especially as the opportunities of this term are regularly demonstrated by social media companies. Funding agencies and governments have seen this and put Big Data at their top of their science agenda. A consequence are masses of scientist, sitting in conference sessions about Big Data and discussions vary between the question on what it is and how it can be used. Nevertheless, there are a lot of traps in this field, who might have serious consequences for science in general. Continue reading

Data peer review paper background: Why quality is a dicisive information for data?

Using information on data quality is nothing new. A typical way to do it is by the uncertainty of the data, which gave different data points in many different data analysis methods something like a weighting. It is essential to know, which data points or parts can be believed and which are probably questionable. To help data reusers with this, a lot of datasets contain flags. They indicate when problems occurred during the measurement or when a quality control method raised doubts. Every scientist who analyses data has to look after this information, and is desperate to know whether they explain for him/her the reason, why for example some points do not fit into the analysis of the rest of the dataset.

By the institutionalised publication of data, the estimation of data quality gets to a new level. The reason behind this is that published data is not only used by the scientists, who are well aware of the specific field, but also by others. This interdisciplinary environment is a chance, but also a thread. The chances can be seen by new synergies, bringing new views to a field and even more the huge opportunities of new dataset combination. In opposite to this the risks are the possible misunderstandings and misinterpretations of datasets and the belief that published datasets are ideal. The risks can at best countered by a proper documentation of the datasets. Therefore is the aim of a data peer review to guarantee the technical quality (like readability) of the dataset a good documentation. This is even more important since the datasets itself should not be changed at all. Continue reading