Type of Data

RDM

Data from different size:

Consider the implications of data volumes in terms of storage, backup and access. Estimate the volume of data in MB/GB/TB and how this will grow to make sure any additional storage and technical support required can be provided.

Type of variables

Physical format and structure

Data is often found as a digital file containing numbers and text, but you may need to inventory, process and store other data types, from sound recordings and images to archaeological artefacts.

Some digital files are structured data, generally organised around observations and variables, but some may be unstructured data, which does not follow a set template, for instance Internet content, books, images or emails.

A wide range of data formats can be stored as digital files and/or processed using qualitative data analysis software or content and document management systems.

Regardless of the data format and structure, a consistent approach and inventory are needed. For instance, physical data items should have a unique identifier and be inventoried in a digital file containing a description of each item and other information that may be relevant to their identification and retrieval.

Size

Consider the implications of data volumes in terms of storage, backup and access. Estimate the volume of data in MB/GB/TB and how this will grow to make sure any additional storage and technical support required can be provided.

For large volumes of data, it is important to consider the data format as some formats or software may be more suitable than others.

Digital format

Choosing non-proprietary or open/standard, and lossless file formats helps ensure future accessibility and usability of your digital data. Decisions may be based on staff expertise, a preference for open formats, the standards accepted by data centres or widespread usage within a given community. While using proprietary software may be the right solution to process, analyse and store your data, it is important to save portable, interoperable versions of the data, which may be opened using other software. This is important to ensure that the data will still be accessible should a specific software or software version not be available anymore.

ANDS provide some tips regarding the selection of data format:

  • decide and agree on data format before data collection
  • analyze carefully the advantages of proprietary or open standards software to ensure that access, reuse and future storage of the data meets future reuse and storage needs.
  • File formats may become obsolete with time, keep it in mind when choosing.
  • You may want to keep data in two different format, but be make sure that the advantages (limit risk of loss and obsolescence) overpass the disadvantage (storage capacity for two datasets).
  • Keep in mind that High-resolution data may require conversion to another format for ease of visualisation online or transmission via email messaging.
  • Look at the best solution in your field by asking your fellow colleagues. or by looking at the ANDS guidance:

As an example, a data back-up using text formats such as .csv, .tab, .rtf or .txt is compatible with a wide range of software. See also the recommendation list of the UK data archive, below:

Variable type

If your data is structured, an important distinction is the type of variables available in your data file. Variable types may be classified as follows:

  • Quantitative variable: variable with numeric, measurable values
    • Discrete: quantity measured as an integer – example: number of people
    • Continuous: quantity measured as a real number (any possible value on a continuum between a minimum and a maximum) – example: temperature
  • Qualitative or categorical variable: variable with values corresponding to pre-defined categories or levels
    • Nominal: levels do not have a specific order – exemple: eye colour
    • Ordinal: levels are ordered – example: student ranking

Many data management software also include a string or character variable type, which allows storing free text in the file.

It is important to identify variable types and ensure that the data is correctly recognised by data management software.