For Data, metadata is often defined as "data about data," a description that fails to capture why metadata is crucial to any data management plan. Paul Miller, in Metadata: What it means for memory institutions, provides a richer description of metadata's many utility and value:
In essence, metadata is the extra baggage associated with any resource that enables a real or potential user to find that resource; to decide whether or not it is of value to them; to discover where, when and by whom it was created, as well as for what purpose; to know what tools will be needed to manipulate the resource; to determine whether or not they will actually be allowed access to the resource itself and how much this will cost them. Metadata is, in short, a means by which largely meaningless data may be transformed into information, interpretable and reusable by those other than the creator of the data resource.
Metadata applications and management (2004).ed. G.E. Gorman and Daniel G. Dorner. Lanham, MD: Scarecrow Press, p. 4.
Metadata allows researchers outside the original research or collection team to
Metadata is structured information, and can be associated with many different types of information object, such as books, and journal articles, photographs, as well as research data. Metadata enables discovery, use, exchange, as well as storage and preservation of those objects.
For research data, metadata often includes:
Many academic disciplines, and international organizations, have created or adopted metadata standards based on the needs of their community of practice. Using the correct metadata standard for your research data can be important for including your research data in a disciplinary specific repository.
The Data Curation Centre, a UK based but global in scope data management organization has some of the best listings of metadata standards as well as associated tools and use cases. The DCC provides expert advice and practical help on how to store, manage, protect and share digital research data.
There are some additional Metadata standards for the humanities and social sciences.
It is common to format documentation for a dataset in what is known as a README file. A README file is a plan text file that includes information that will make it easier for researchers to understand and reuse the dataset. Some disciplines have established standards for what should be inclued in a README file. Absent these standards, you may consider including:
An example of a dataset with a README is available in the University of Virginia's Dataverse repository.
A data dictionary describes all the data stored in a data set or database. It should including the types of data, attributes, structure, and relationships. If used in a database or software program, their relationship to other portions of the program or system should be described. A good data dictionary can be a valuable part of the metadata describing a data set, enabling a user to get a clear understanding of the content and organization of the data and how it could be modified, if necessary. In the context of a database or software package, the data dictionary may be an essential piece of software that programmers and the database management system require to access and use the data properly. The user view of a data dictionary is usually presented as a table or spreadsheet. Dictionaries may also be incorporated into XML files or other mark-up languages. A data dictionary does not contain the data, but only describes it.
A data dictionary typically contains a list of all files in the database, names for each file, the type of data included, a list of all field names and variable names, a description of the information contained in each field, and the various attributes of each field. These may include type (text, date, numeric, etc.), standard formats, units, field length, description, unique identifiers, default values, whether a value is required or not, and more, depending on the specific data.
For some examples of data dictionaries, check the following sites: