Image credit: Anna Wåhlin
Many research institutions, funding agencies, and journals require researchers to sign some kind of statement promising that they will share their data. But “sharing data” can mean lots of different things. A researcher survey found that more than half of respondents share at least some of their data (Wiley, 2014). In this context, data sharing can mean simply signing a statement that you will share your data on request, or emailing a copy of your file to a collaborator, or it can mean submitting supplementary materials to a journal, or it can mean fully documenting your data in a metadata record, and lodging the data in a public data repository so that it can be discovered and downloaded by other scientists, or anywhere in between. Ideally, all researchers would lodge their data in a public data repository and write full metadata records to maximise its discoverability.
At SOOS, we encourage all Southern Ocean researchers to share their data in the fullest sense of the word. Thus, we encourage researchers to lodge their data in a national, domain-specific, or institutional data centre that allows their data to be freely downloaded. Furthermore, these datasets should be accompanied by a comprehensive metadata record that is copied into all relevant metadata repositories, including NASA's Global Change Master Directory.
Because the Antarctic Treaty says you must
If you work in the waters or land south of 60°S, you are obliged to make your data publicly available, as soon as practicably possible.
Because it makes everyone’s datasets more useful
Seriously. You might collect data from one mooring station, or ten gliders, or one ship transect in one year. Other people collected data from other moorings, gliders, and ship transects in the same and other years. If you all make your data accessible, then you all get to use five, ten, twenty times as much data as you could collect using your own grants. Tenopir et al (2011) found that two-thirds (67%) of scientists in their cross-disciplinary survey report that lack of access to other researchers’ data is a major impediment to progress in their field, and half (50%) believe that lack of access to others’ data had hampered them from addressing a particular research question.
Because you’ll get more citations / visibility
If your data are used in another research paper, you get a citation. And another one if they also cite the paper that your data were first published in. For almost no effort on your behalf.
Because all the cool kids are doing it
Again. Seriously. More than half of global scientists now share their data, one way or another. And that figure will keep going up as more journals, institutions, and funding agencies make data sharing/storage a requirement of publication or funding. Get your processes sorted out now, and it will be easy to keep ahead of the herd. Against that, only 6% of researchers make all their data publicly available (Tenopir et al. 2011), so most people have room to improve.
Because when your data is documented and in an external repository, it’s safe from dead hard drives and your forgetfulness
Your student has moved on and nobody knows what the column labelled “MaxGit” means. Or your hard drive died and you can’t remember what folder the 2014 data was hiding in on the back-up server. Or whether version 8 turned out to be more accurate than version 9. Or was version 8a the final, final one? Or did your collaborator make some final corrections? Documenting and storing the cleanest version you have, while the data are fresh in your mind will save you from these goose chases. If your data and metadata records are stored in a public repository, it doesn’t matter if you’ve changed institutions or are logging on from another computer. Your data will still be there, still clean, and still documented.
Because the journal / funding body / institution made you
Even if the journal for this paper, and your organisation don’t *yet* require you to make your data accessible, it’s inevitable that they will someday soon. Don't hold up your next paper submission while you and your collaborators (including the one on a three-month cruise) scramble to get your data into a repository. Sort it out now.
Because it’s easy and quick to do
It typically takes half an hour to write a comprehensive metadata record through the GCMD's metadata writing tool (you'll need to apply for a login first). The metadata writing tools at national and institutional data centres are similarly quick.
Because I’ve only just collected it and haven’t had a chance to publish it myself yet
It's fair enough that you want to have the first go at publishing the results of your research. SOOS is not advocating that you hand over your hard-won data to other researchers before you have a chance to analyse it yourself. Many data repositories allow you to embargo your data so that you get all the benefits of having your data secured in a repository early, while also giving you time to analyse and publish them. An alternative approach is to write a metadata record to be published somewhere like the SOOS metadata portal so that other researchers know that the data exist, and to update that record when you submit the data itself to a repository as you publish your findings.
Because my data are confidential
Data may have legitimate reasons for access to be restricted. The SCAR data policy notes that access may be restricted for confidential data about human subjects, or data with intellectual property issues, or because releasing the data may cause harm to the public or environment (e.g. locations of nest sites for endangered species). These are the only exceptions to the SCAR and SOOS policy of full, free, and open access to data.
Because I might not get properly cited for my work
You're right. Someone might pinch your data and use it without crediting you. They might do the same with findings from papers you've published too, and there's not much we can do to stop them in either case, except hope that peer review will pick up their misdeeds. The SOOS data policy clearly lays out the expectation that data users appropriately acknowledge all contributors and data sources. This will be typically in the form of a citation.
Because my data are a bit dodgy
A study of data sharing among psychology researchers showed that papers for which researchers failed to share data on request were associated with more errors in reporting their statistical results and with weaker p-values (Wicherts et al., 2011). Unwillingness to share was strongly associated with errors that affect statistical significance. The authors speculated about whether authors’ willingness to share data was affected by fear that sharing their data would lead to the discovery of errors, or whether those researchers with more rigorous data storage and sharing processes are also more likely to be rigorous in their analysis and reporting of results. This study looked at researchers who were directly asked to share their data, rather than those who had submitted their data voluntarily. In the absence of better evidence (and a spy camera in your office!) we’ll leave this one to you to decide - could preparing your data for publication serve as an extra check on errors in your paper?
Even if your data aren't perfect, they may well be more than adequate for some data purposes. A good and honest metadata record will clearly identify potential errors in a dataset so that users can decide whether the data are adequate for their needs. For instance, you may have discovered that your count data are unreliable, but the dataset may still be more than adequate for a study that is interested only in presence/absence information.
Because I don’t have time
Do you have time to hunt out the best version of your data when you want in three years' time? And then to remember what all the field names mean? Alternatively, do you have a few minutes now to fill in the fields in a metadata template, so then it's out of your hair for good?
Because I don’t know where to put it / my institution doesn't have a data repository
A cross-disciplinary survey of scientists found that just one-third of researchers (35%) are given the appropriate tools and technology for long-term data storage by their institution (Tenopir et al., 2011). In the same survey, 59% of respondents reported that their institution does not provide funds for data management beyond the life of a single project. Fortunately, even if your research institution is among those that lack a data repository of funding for data managements, there are plenty of other places where your data will likely be welcome long-term - either publicly or under embargo. The SOOS data officer can help you find the best home for your data.
Because my data might be misused/misinterpreted
Tricky. It is possible that another researcher might misuse or misinterpret your data. The best insurance against this is to write your metadata record as clearly and fully as possible to ensure that other researchers understand the limitations of the dataset.
Because my data are too boring to be of interest to anyone else
How do you know? Your dataset may be too small to be interpretable on its own or fail to tell an exciting story on its own, but still be valuable when combined with other datasets. The longer it sits on your hard drive, the less likely it is to get used at all. Get it out there, and see what might come from the data you worked so hard to collect.
Because my post-doc left and nobody else knows what the files mean
At risk of I-told-you-so, this is why you should have got the post-doc to submit the data to a data repository three years ago. It's going to be a nuisance for you to write a metadata record now, but the longer you leave it, the worse it will get, as more staff move on and memories fade. Taking the time to unpick the mess now will make life a whole lot easier down the track.
Tenopir C, Allard S, Douglass K, Aydinoglu AU, Wu L, Read E, et al. (2011) Data Sharing by Scientists: Practices and Perceptions, PLoS ONE 6(6): e21101. doi:10.1371/journal.pone.0021101
Wiley, 2014 Researcher Data Sharing Insights Infographic