LibGuides: Open Science: Open research data and materials

Questions, obstacles and common misconceptions

Is it sufficient to make my data openly available?

No - openness is a necessary but not sufficient condition for maximum reuse. Data have to be FAIR in addition to open.

What do the FAIR principles mean/imply for different stakeholders/audiences?

Researchers may be reluctant to share their data because they are afraid that others will reuse them before they have extracted the maximum usage from them, or that others might not fully understand the data and therefore mis-use them. You may publish your data to make them findable with metadata, but set an embargo period on the data to make sure that you can publish your own article(s) first.

Is making my data FAIR a lot of extra work?

No necessarily. Making data FAIR is not only the responsibility of the individual researchers but of the whole group. The best way to ensure your data is FAIR is to create a Data Management Plan (DMP) and plan everything beforehand. During the data collection and data processing follow the discipline standards and measures recommended by a repository.

I want to share my data. How should I license them?

First of all think about who owns the data? A research funder or an institution that you work for. Then, think about authorship. Applying a suitable license to your data is crucial in order to make them reusable.

I cannot make my data directly available - they are too large to share conveniently/have restrictions related to privacy issues. What should I do?

You should talk to experts in domain specific repositories on how to provide sufficient instructions to make your data findable and accessible.

What is it?

Research data are often the most valuable output of many research projects, they are used as primary sources that underpin scientific research and enable derivation of theoretical and applied findings. Open research data is data that can be freely accessed, reused, remixed and redistributed, for academic research and teaching purposes and beyond. Ideally, open data have no restrictions on reuse or redistribution, and are appropriately licensed as such. In some cases, e.g. to protect the identity of human subjects, special or limited restrictions of access are set. Openly sharing data exposes it to inspection, forming the basis for research verification and reproducibility, and opens up a pathway to wider collaboration.

The best practice recommendation for open research data it for the data to be as open and FAIR as possible, while accounting for ethical, commercial and privacy constraints with sensitive data or proprietary data.

FAIR principles

The FAIR data principles is a core set of principles to optimise the reusability of research data.

Findable: The first thing to be in place to make data reusable is the possibility to find them. It should be easy to find the data and the metadata for both humans and computers. Automatic and reliable discovery of datasets and services depends on machine-readable persistent identifiers, such as DOIs, and metadata.
Accessible: The (meta)data should be retrievable by their identifier using a standardised and open communications protocol that includes authentication and authorisation. Also, metadata should be available even when the data are no longer available.
Interoperable: The data should be able to be combined with and used with other data or tools. The format of the data should therefore be open and interpretable for various tools, including other data records. The concept of interoperability applies both at the data and metadata level. For instance, the (meta)data should use vocabularies that follow FAIR principles.
Re-usable: Ultimately, FAIR aims at optimising the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings. Also, the reuse of the (meta)data should be stated with clear and and accessible licenses.

Publishing data

Most researchers are more or less familiar with Open Access publishing of research articles and books. More recently, and for the reasons mentioned above, data publishing has gained increasing attention. More funders expect the data produced in research projects they finance to be findable, accessible and as open as possible.

There are several ways to make research data accessible:

Publishing data as supplemental material associated with a research article, typically with data files hosted by the publisher.
Hosting data on a publicly available website, with files available for download.
Depositing data in a repository that has been developed to support data publications, e.g. figshare.
A large number of general and domain or subject specific data repositories exist which can provide additional support to researchers when depositing their data, e.g. DIRISA.
Publishing a data paper about the dataset, which may be published as a preprint, in a journal, or in a data journal that is dedicated to supporting data papers. The data may be hosted by the journal or hosted separately in a data repository. Examples of data journals include Data Science Journal (by CODATA) and Data in Brief (by Elsevier). See these examples of data papers by UFS authors.

Make use of the UFS's institutional data repository, figshare, where possible. Funders might require you to deposit your data in a specific repository. re3data can be used to discover other available data repositories.

Important: Start planning where to deposit or publish your research data already in your data management plan (DMP). Consider which data and associated metadata, documentation and code will be deposited. Ask yourself how long the data will need to be retained. And for how long the data should remain reusable. How will your data be made available? What access will you provide? Remember, if your dataset is to 'count' as a publication/research output, it should follow a similar publication process as an article - properly documented with metadata, reviewed for quality, searchable and discoverable in databases, and citable in articles.

Data citation

Data citation services help research communities discover, identify, and cite research data (and often other research objects*) with confidence. This typically involves the creation and allocation of Digital Object Identifiers (DOIs) and accompanying metadata through services like DataCite and CrossRef, and can be integrated with research workflow and standards. This enables research articles to be linked to any underlying data, and legitimises research data as contributions to the process of scholarly communication. It can also help to recognise new metrics and publication models, as well as pave the way for rewarding data sharing.

Read more about data citation principles.

*In addition to data sharing, the openness of research relies on sharing of materials. Here are some examples of what you can share, although it will be discipline specific or sometimes unique to a lab:

Reagents used to create chemical or other reactions.
Protocols of scientific experimental observations.
Notebooks, containers, software and hardware aids reproducible analysis by the use of literate programming, container technology and virtualisation. In addition to sharing your code and data, also share your notebooks, images, or other analysis materials or software dependencies.

Sharing sensitive and proprietary data

With appropriate data management planning much sensitive and proprietary data can be shared, reused, and FAIR. The metadata can almost always be shared. Guidance and best practices for sharing sensitive data are necessarily region-specific because of differing regulations, e.g. POPIA for researchers in South Africa.

Consult with your ethics review board on de-identification of personal research data. Some datasets will never be suitable to safely de-identify and share. Researchers can still improve the openness of research on such data by creating and sharing synthetic data. Synthetic data is similar in structure, content, and distribution to the real data and aims to attain "analytic validity": statistical analysis will return the same results for synthetic data as the real data.