Research Guides: Research Data Management: Describing Data

Metatdata for Data

For Data, metadata is often defined as "data about data," a description that fails to capture why metadata is crucial to any data management plan. Paul Miller, in Metadata: What it means for memory institutions, provides a richer description of metadata's many utility and value:

In essence, metadata is the extra baggage associated with any resource that enables a real or potential user to find that resource; to decide whether or not it is of value to them; to discover where, when and by whom it was created, as well as for what purpose; to know what tools will be needed to manipulate the resource; to determine whether or not they will actually be allowed access to the resource itself and how much this will cost them. Metadata is, in short, a means by which largely meaningless data may be transformed into information, interpretable and reusable by those other than the creator of the data resource.

Metadata applications and management (2004).ed. G.E. Gorman and Daniel G. Dorner. Lanham, MD: Scarecrow Press, p. 4.

Metadata allows researchers outside the original research or collection team to

Find the data
Know who created the data or contributed to the creation of the data (i.e. a funder)
Understand how the data was created and manipulated
Know when the data was created
Determine tools needed to view, manipulate, and use the data
Understand rights and use conditions surrounding the data
Connect to related information objects

Metadata standards for data

Metadata is structured information, and can be associated with many different types of information object, such as books, and journal articles, photographs, as well as research data. Metadata enables discovery, use, exchange, as well as storage and preservation of those objects.

For research data, metadata often includes:

Information about the researches involved with the data creation
Name and title of the data set
Dates associated with the creation of the data
Abstract or brief description of the data
Terms and conditions associated with the data

Many academic disciplines, and international organizations, have created or adopted metadata standards based on the needs of their community of practice. Using the correct metadata standard for your research data can be important for including your research data in a disciplinary specific repository.

The Data Curation Centre, a UK based but global in scope data management organization has some of the best listings of metadata standards as well as associated tools and use cases. The DCC provides expert advice and practical help on how to store, manage, protect and share digital research data.

There are some additional Metadata standards for the humanities and social sciences.

Data Documentation Initiative (DDI)
(DDI) is an international standard for describing the data produced by surveys and other observational methods in the social, behavioral, economic, and health sciences. DDI is a free standard that can document and manage different stages in the research data lifecycle, such as conceptualization, collection, processing, distribution, discovery, and archiving. Documenting data with DDI facilitates understanding, interpretation, and use -- by people, software systems, and computer networks.
Categories for the Description of Works of Art
CDWA is a set of guidelines for the description of art, architecture, and other cultural works. CDWA represents common practice and advises best practice for cataloging, based on surveys and consensus building with the user community.
VRA CORE
The VRA Core is a data standard for the description of works of visual culture as well as the images that document them. The standard is hosted by the Network Development and MARC Standards Office of the Library of Congress (LC) in partnership with the Visual Resources Association.
Text Encoding Initiative (TEI)
TEI is a consortium which collectively develops and maintains a standard for the representation of texts in digital form. Its chief deliverable is a set of Guidelines which specify encoding methods for machine-readable texts, chiefly in the humanities, social sciences and linguistics.

README Files

It is common to format documentation for a dataset in what is known as a README file. A README file is a plan text file that includes information that will make it easier for researchers to understand and reuse the dataset. Some disciplines have established standards for what should be inclued in a README file. Absent these standards, you may consider including:

Descriptions of each file in the dataset (including their formats)
Names, contact information, and affiliations of PIs and co-investigators
Dates of data collection
Location information (if relevant)
Key words
Names and definitions
Units of measurement
Licensing and access information
Recommended citations

An example of a dataset with a README is available in the University of Virginia's Dataverse repository.

Data dictionaries

A data dictionary describes all the data stored in a data set or database. It should including the types of data, attributes, structure, and relationships. If used in a database or software program, their relationship to other portions of the program or system should be described. A good data dictionary can be a valuable part of the metadata describing a data set, enabling a user to get a clear understanding of the content and organization of the data and how it could be modified, if necessary. In the context of a database or software package, the data dictionary may be an essential piece of software that programmers and the database management system require to access and use the data properly. The user view of a data dictionary is usually presented as a table or spreadsheet. Dictionaries may also be incorporated into XML files or other mark-up languages. A data dictionary does not contain the data, but only describes it.

A data dictionary typically contains a list of all files in the database, names for each file, the type of data included, a list of all field names and variable names, a description of the information contained in each field, and the various attributes of each field. These may include type (text, date, numeric, etc.), standard formats, units, field length, description, unique identifiers, default values, whether a value is required or not, and more, depending on the specific data.

For some examples of data dictionaries, check the following sites:

Data Dictionary Examples – Ag Data Commons – National Agricultural Library - USDA
Sample Dataset 2014 - Statistical Consulting at University Libraries, Kent State University. Click on the link to “Data definitions (*pdf)” in the Sample Data Files section.
Fleet DNA Data Dictionary – National Renewable Energy Laboratory (NREL).
Protein Data Bank Exchange Data Dictionary (PDBx/mmCIF V4.0) – Worldwide Protein Data Bank. There are separate tabs for Category Groups, Data Categories, and Data Items.