Skip to main content

Select and prepare your data

biul |

Not all data has to be published. And in some cases, data cannot be published. The rule for opening up research data is: "As open as possible, as closed as necessary". It is important to select the data to be published.

 

Selecting the data to be published : 

Pay attention to the legislation applicable to your data, your consortium agreement, any agreement you may have with your funder or any other contract to find out whether there are any restrictions on sharing your data and whether you must retain or destroy certain data.

If your dataset includes personal data, you must comply with the GDPR. Under the GDPR, personal data may only be retained for as long as is necessary to achieve the original purposes for which the data was processed and must then be securely destroyed. However, there may be exceptions for scientific, statistical or historical purposes. You can find more information on the RGPD here.

In addition to personal data, there may be other restrictions on opening and sharing data. For example, does your dataset contain confidential data, data protected by copyright, data with commercial potential or data that would breach a prior commitment to share data (for example, a consortium agreement, etc.) ?

For data that is not subject to restrictions, you can choose which data to keep, taking into account its uniqueness, its long-term value and its potential for re-use. For example, you may want to keep certain data to validate the results of your publication, for teaching purposes or for future research. However, you should also take into account the costs (time, software, etc.) and effort required to preserve the data (preparation, documentation, storage, etc.).

Depending on these various aspects, you can specify a retention period. Some data will be obsolete in 2, 5, 10 or 50 years, depending on the research subject.

The Digital Curation Center suggests 5 steps to decide what data to keep *:

*(Whyte A., DCC, Five steps to decide what data to keep: a checklist for appraising research data (v.1), Edinburgh: Digital Curation Centre, 2014). 

Preparing selected data for publication :

For open data (published as open data) to be useful, i.e. reusable, it needs to be prepared. More specifically, this means documenting the data, choosing open (non-proprietary) file formats and adding metadata.