The new paper proposes an ATTAC^3 sheme as guidance of data handling, which will also be of interest for other fields. To be open, collecting these points in this way was not my idea, but emerged during the writing process from one of the co-authors. I think it is a good basis for handling research data in general so I will explain it a bit from my personal view in the following.
The abbreviation ATTAC^3 is standing for: Accessibility, Transparency, Trust, Availability, Continuity, Completeness and Communication. All these steps are essential for a reliable exchange of data between scientist and would open a future-proof usage of it. The main details are discussed in the paper, but I just want to say a few words to each of them:
- Data has to be accessible, ideally of course for everyone, but that does not necessarily mean it has to be free (even when this of course makes in many cases data usable at all). Accessibility starts with offering the data in standardised, self-explaining formats. When data will be read in 20 years, it still have to be understandable by the scientist of the community at that time. Accessibility includes also a well documented environment.
- Datasets have to be stored in a way, that it is totally transparent, how each entry was obtained. Especially when it bases on calculations, the single raw data and the algorithm have to be given as well. Also an information on the quality has to be given for each data entry. Example how this can be done and why it is important have been written up by me in a former paper. Nevertheless, it can sometimes be done in a very simple way using a simple comment.
- Researchers have to trust the gathered data as if they were collected by their own. This is an essential part of data exchange and as such requires some care. But also those scientists who deliver the data for the databases have to be able to trust that they are still gain something from it. I will write on it in one of the next posts.
- Making data available at one point is good, but it has to be accessible for the very long-term to form a part of the scientific process. Guaranteeing this long-term availability is not only depending on an existing web server, but also on continued funding.
- Databases have to reflect the current knowledge to keep relevance. As such, databases have to be maintained and actualised regularly, which again is requiring funding.
- Data entries have to be complete within a database. A data entry, which has for example no uncertainty attached, is for many applications like a non-existing data entry.
- Communicating the database entries just as bits and bites is not always the best way. Visualising the content, offering the content in many different formats and making data available so that everybody can use them is essential for the success of a good database.
Many of the named issues are self-explaining and belong to good practice in many fields. Anyhow, the new age of massive data availability and exchange requires even more care at these points. A good database is not only depending on the number of entries, but also on the form and the persistency it is made available.