Are data scientists “research parasites”?

Last week the two scientists Dan L. Longo and Jeffrey M. Drazen, both from medicine, published an editorial, which was also picked up by some news outlets. In this they talk about an emerging class of scientists, who are not anymore involved in the basic data collection, but reinterpret the results of original studies by their own methodologies and by combining several studies into a new dataset. They state that some scientists name this kind of researchers “research parasites” and claim that they are a problem for science overall. The authors prefer a system, where the scientists work directly together and publish their studies as coauthors.

As I describe myself as a data scientist, I am very used to these kinds of arguments. I wrote my PhD about data sharing and the included problems of quality assurance coming from these procedures. Yes, I focus on geosciences, but in the end, there are the same issues. Did they introduce new arguments? Not really, these problems are discussed for years. The critical issue in their text is that they mix two topics, which should not be mixed in this way. This is dangerous for science in general, as it might lead to the problem, that the real issues are not handled.

The first is the problem that when you share data, the receiving scientist do not have all the information and details of the original experiments. I have written on it a longer piece some time ago. Yes, it is an issue and it usually can only be treated by implementing properly standardisation and documentation. In medicine these problems might be even more problematic, as humans are much harder to “measure” than our environment. But there are ways to handle this, reasons to be careful about these studies and topics, which have to be taught at the universities to make it better in the future. It is essential for science to work on it.

The second issue is the credit problem. By pushing the argument that scientists should be co-authors of new research from their data, they pick on an important point. How to give scientists credit for their work, when it is used by others. This will be a topic here in the future, as I am currently involved in a paper, which hit on this issue, but to make it short: It is important, it is hard to solve, and many loved scientific practices would have to be changed. But: It is has to be independent of the quality of the datasets. How should someone who is combining let’s say 100 studies be able to go to everyone, talk to everyone, involve them in study design and publish afterwards with them as co-authors? It does not work. Yes, it might be the ideal way for small sample sizes, but overall it cannot be an excuse for improperly document and publish your data.

Data sharing offers many important opportunities to generate completely new types of research. But there are issues on the way, which has to be handled in future. One is how to ideally bring the information from one scientist to the next and the other how to give credit to the original data collectors. They have to be treated separately and have to be solved before we can see data science as proper science. But apart from that these scientists are no parasites, even today. They are doing a new kind of research, which will be more and more important in the future. So we should think about how to make it work in the best way and not trying to stop it.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s