Guidelines to research data management at the UFS:
There are many decisions to make about data management even before data creation/collection - choosing hardware and software, intellectual property rights, ethics, regulatory issues, and more. These decisions will affect how data is accessed, used and preserved in the future. The best starting point is a data management plan, whether it is a funder requirement or not. Even informally noting your research plans and project guidelines can make your life easier.
Data management planning improves efficiency in research, ensures information is protected, allows results to be checked by others, improves reproducibility, improves exposure via sharing, and allows compliance with funders' policies. A Data Management Plan (DMP) is a key element of good data management. The DMP will describe the data management life cycle for the data to be collected, generated and processed.
Source: University of California, Santa Cruz: The Research Data Management Lifecycle
A DMP is intended to be a living document in which information can be made available in more detail through updates and periodic reviews as the research project progresses, and when significant changes occur. DMPs should therefore have clear version numbers.
The University of Cambridge summarises data management planning:
Tools and templates to create a DMP:
DMP tools available online:
Intellectual Property Rights (IPR), like copyright and patents, affect how you and others can use your research outputs, including your research data, in terms of its dissemination, future related research projects and associated profit and credit.
Who can help you with IPR questions?
At the UFS you can get more information at the Innovation office on IPR and on the Research website on Legislation.
What about data I find online?
Check to see how the data is licensed - IPR still applies even if you don't see a © attached or 'all rights reserved' notice. Creative Commons licensing will give you an idea of different licensing options and what to look for. And always cite the data. DataCite provides guidelines on how to do this.
Ethical guidelines are provided by funders and also by the University. In addition, laws governing personal data must be adhered to.
It costs money to keep all your data and files for future use, and confusing to find specific items in future. Selection of what data to retain, and for how long, will save storage space, staff hours ... in other words, money and time. Selecting what to keep or dispose of will involve subjective judgement, since it is not possible to know what exactly might be needed in future. The best way to select what to keep or delete is to abide by relevant funder/institutional policies and document all decisions (including the reasons).
The Digital Curation Centre lists questions that can help you decide what to keep and what to delete:
Make sure you are aware of your responsibilities in terms of data protection if you need to store your data.
File formats might be dictated by the software you use, or by the conventions of your discipline. Sometimes you will have to choose from various formats. During the planning phase, keep the following in mind:
Here are some good file formats for the preservation of specific types of data:
Type of data | Recommended formats | Acceptable formats |
---|---|---|
Tabular data with extensive metadata |
SPSS portable format (.por) |
Proprietary formats of statistical packages: SPSS (.sav), Stata (.dta), MS Access (.mdb/.accdb) |
Tabular data with minimal metadata |
Comma-separated values (.csv) |
Delimited text (.txt) with characters not present in data used as delimiters |
Geospatial data |
ESRI Shapefile (.shp, .shx, .dbf, .prj, .sbx, .sbn optional) |
ESRI Geodatabase format (.mdb) |
Textual data |
Rich Text Format (.rtf) |
Hypertext Mark-up Language (.html) |
Image data | TIFF 6.0 uncompressed (.tif) |
JPEG (.jpeg, .jpg, .jp2) if original created in this format |
Audio data | Free Lossless Audio Codec (FLAC) (.flac) |
MPEG-1 Audio Layer 3 (.mp3) if original created in this format |
Video data |
MPEG-4 (.mp4) |
AVCHD video (.avchd) |
Documentation and scripts |
Rich Text Format (.rtf) |
Plain text (.txt) |
A note on non-proprietary or open formats: these formats are publicly available (open source) and could be supported longer than proprietary software, or, at least, there are no restrictions on it use or code.
It is important to store research data securely and safely in order to avoid loss, damage, theft or unlawful access. Think about the following when deciding to where to store research data:
figshare at the University of the Free State can be used for storing and in-project sharing of research data. It provides POPIA and GDPR complaint, secure cloud-based storage space, which is regularly backed up. This service is available to research staff and postgraduate students. figshare can be used for short term or long term storage of data, as well as sharing data where appropriate.
Storing data on portable storage media (CDs, DVDs and memory sticks) and personal computers and laptops is inadvisable. If you do use any of these, make sure that the device or media is encrypted, and that you have multiple copies in more secure storage space.
Here are some tips for safe data storage and backups.
Storing sensitive or confidential data?
What to back up and when?
Personally identifiable information can be direct (information that, on its own, allows you to identify an individual, e.g. names, email addresses including a name, fingerprints, facial photographs, etc.), strongly indirect (information that allows you to identify an individual through minimal effort, e.g. postal addresses, telephone numbers, URLs of personal pages, etc.), and indirectly (information that allows you to identify an individual when linked with other information, e.g. age, location, gender, job title, etc.).
A key approach to protecting personally identifiable information is anonymisation, meaning the irreversible removal and deletion of personal identifiers. In quantitative research, sometimes all that is needed is the removal of direct identifiers. More complex datasets (with more free text) might need more anonymisation techniques. In qualitative research anonymisation is more complicated and needs personal judgement. Here are some best practices for guidance.
Choose a logical and consistent way to name and organise your files - this will allow you to easily locate and use them. Think about your naming conventions and file structure at the beginning of your project to ensure consistency and prevent version control problems.
Organising files
File naming
Documentation and metadata
Managing references
Tip: We send and receive a lot of emails every day, making it difficult to track down that important research project related email you urgently need. Organising emails for your research project can help. Archive your old emails. Delete emails that you do not need. Use folders to store messages. Make sure that you use careful version control when sending/receiving attached documents (consider using other, more secure methods to exchange data).
Additional resources:
There is an increasing trend to share research data, from funders who want to avoid duplication of effort and make data available for other researchers to discover, examine and build upon, to the support of a culture of openness that deters fraud and encourages interdisciplinary research. Sharing data without clear terms of use can make re-use even more difficult, since there are many complexities and ambiguities with, for example, the rights of research databases and its various elements.
Assigning Creative Commons licenses to your research data is one way of giving you control over how your data may be used, from putting your data in the public domain, to reserving all rights. Explore other ways to licenses and waivers for your research data.
Read more about data citation below.
Wanna Work Together? from Creative Commons on Vimeo.
Research data preservation is the process of maintaining access to the data so that it can be found, understood and used in the future. Preservation goes beyond immediate storage and back-up issues and must be considered from the point of research data creation and throughout the entire life-cycle of the data.
Consider the following:
(Check out this list of 'endangered' digital materials by the Digital Preservation Coalition.)
Data preservation should be considered as early as possible. Good practice would be to add preservation requirements, and how it will be achieved, in the data management plan (DMP). Data preservation aims to keep both content (data) and context (metadata) safe for future re-use, so it will be significantly easier to do if preservation is already considered during the planning phase.
Read more about preservation strategies by the Digital Curation Centre or check out this how-to guide for beginners in digital preservation.
Citations has always been a key element in scholarly communication. By citing research data in the same way proper attribution and credit is possible, enabling reproducibility of findings, encouraging a faster research progress, supporting collaboration and re-use of data, and a way of sharing research data.
DataCite recommends the following citation format, but encourages various disciplines to develop citation systems that work well for them:
Creator (PublicationYear). Title. Publisher. Identier.
The version of the dataset and the resource type could also be included:
Creator (PublicationYear). Title. Version. Publisher. ResourceType. Identifier.
The identifier is usually a Digital Object Identifier (DOI), an alphanumeric string assigned to uniquely identify a digital object. It is tied to the object's metadata description and URL (location).
When choosing a repository for your data, select one that will also provide your dataset with a DOI. The UFS's figshare repository will provide this service.