Skip to Main Content
Ask A Librarian

Research Data Management

This guide will assist researchers in planning for the various stages of managing their research data and in preparing data management plans required with funding proposals.

Metatdata for Data

For Data, metadata is often defined as "data about data," a description that fails to capture why metadata is crucial to any data management plan. Paul Miller, in Metadata: What it means for memory institutions, provides a richer description of  metadata's many utility and value:

In essence, metadata is the extra baggage associated with any resource that enables a real or potential user to find that resource; to decide whether or not it is of value to them; to discover where, when and by whom it was created, as well as for what purpose; to know what tools will be needed to manipulate the resource; to determine whether or not they will actually be allowed access to the resource itself and how much this will cost them. Metadata is, in short, a means by which largely meaningless data may be transformed into information, interpretable and reusable by those other than the creator of the data resource.

Metadata applications and management (2004).ed. G.E. Gorman and Daniel G. Dorner. Lanham, MD: Scarecrow Press, p. 4.

Metadata allows researchers outside the original research or collection team to

  • Find the data
  • Know who created the data or contributed to the creation of the data (i.e. a funder)
  • Understand how the data was created and manipulated
  • Know when the data was created
  • Determine tools needed to view, manipulate, and use the data
  • Understand rights and use conditions surrounding the data
  • Connect to related information objects

Metadata standards for data

Metadata is structured information, and can be associated with many different types of information object, such as books, and journal articles, photographs, as well as research data. Metadata enables discovery, use, exchange, as well as storage and preservation of those objects.

For research data, metadata often includes:

  • Information about the researches involved with the data creation
  • Name and title of the data set
  • Dates associated with the creation of the data
  • Abstract or brief description of the data
  • Terms and conditions associated with the data

Many academic disciplines, and international organizations, have created or adopted metadata standards based on the needs of their community of practice. Using the correct metadata standard for your research data can be important for including your  research data in a disciplinary specific repository.

The Data Curation Centre, a UK  based but global in scope data management organization has some of the best listings of metadata standards as well as associated tools and use cases. The DCC provides expert advice and practical help on how to store, manage, protect and share digital research data.

There are some additional Metadata standards for the humanities and social sciences.

README Files

It is common to format documentation for a dataset in what is known as a README file. A README file is a plan text file that includes information that will make it easier for researchers to understand and reuse the dataset. Some disciplines have established standards for what should be inclued in a README file. Absent these standards, you may consider including:

  • Descriptions of each file in the dataset (including their formats)
  • Names, contact information, and affiliations of PIs and co-investigators
  • Dates of data collection
  • Location information (if relevant)
  • Key words
  • Names and definitions
  • Units of measurement
  • Licensing and access information
  • Recommended citations

An example of a dataset with a README is available in the University of Virginia's Dataverse repository

Data dictionaries

A data dictionary describes all the data stored in a data set or database. It should including the types of data, attributes, structure, and relationships. If used in a database or software program, their relationship to other portions of the program or system should be described.  A good data dictionary can be a valuable part of the metadata describing a data set, enabling a user to get a clear understanding of the content and organization of the data and how it could be modified, if necessary. In the context of a database or software package, the data dictionary may be an essential piece of software that programmers and the database management system require to access and use the data properly. The user view of a data dictionary is usually presented as a table or spreadsheet. Dictionaries may also be incorporated into XML files or other mark-up languages. A data dictionary does not contain the data, but only describes it.

A data dictionary typically contains a list of all files in the database, names for each file, the type of data included, a list of all field names and variable names, a description of the information contained in each field, and the various attributes of each field.  These may include type (text, date, numeric, etc.), standard formats, units, field length, description, unique identifiers, default values, whether a value is required or not, and more, depending on the specific data.

 

For some examples of data dictionaries, check the following sites: