Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Ask A Librarian

Research Data Management

Why document data?

Clear and thorough data documentation helps make sure that your data can be understood, interpreted, and reused. Data documentation can take different forms, but it is commonly a text document that thoroughly describes the content of a dataset, including information about file formats used, the way the data is structured, and relationships among different components of the dataset. 

Items that could be included in data documentation are:

  • your organizational system (e.g. file naming conventions, folder structure)
  • methods used to collect, process, clean, and transform data
  • the structure of the dataset, relationships between files
  • meanings of fields in spreadsheets, units used, meanings of acronyms or abbreviations 
  • changes made to the dataset over time
  • known problems and limitations
  • a recommended citation
  • licensing information indicating how others are permitted to use the dataset

README Files

It is common to format documentation for a dataset in what is known as a README file. A README file is a plan text file that includes information that will make it easier for researchers to understand and reuse the dataset. Some disciplines have established standards for what should be inclued in a README file. Absent these standards, you may consider including:

  • Descriptions of each file in the dataset (including their formats)
  • Names, contact information, and affiliations of PIs and co-investigators
  • Dates of data collection
  • Location information (if relevant)
  • Key words
  • Names and definitions
  • Units of measurement
  • Licensing and access information
  • Recommended citations

An example of a dataset with a README is available in the University of Virginia's Dataverse repository