Metadata & documentation

Good documentation ensures your data can be understood, validated, reused—and reproduced. It also helps you reflect on your workflows and align your data handling with disciplinary standards.

What you need to know about metadata

Metadata is data about data – i.e. how it was collected, in what format, and under what conditions. It supports data discovery, reuse, and long-term accessibility.

Metadata should be:

  • Descriptive: what the data contains
  • Contextual: how and why it was created
  • Accessible: ideally machine-readable and based on standard schemas

Some metadata is generated automatically; some must be added manually. Always follow standards relevant to your field.

HowtoFAIR.dk/metadata, DOI:10.5281/zenodo.3712064

The best time to capture metadata is during the research process, while your data are still active and fresh in context. Documenting early helps ensure accuracy and completeness, and reduces the risk of important details being lost later.

For others to be able to understand and reuse your data, documentation at the study level is important.

Study level means that the documentation describes the data in relation to the entire study or project—that is, the overall purpose, context, and design.
This differs from, for example, file level (which concerns how individual data files are structured) or variable level (which describes the individual fields/questions).  

Examples of metadata include

  • Log files from instruments or software

  • Lab notebooks, ideally Electronic Lab Notebooks (ELN)

  • ReadMe files with key information (variables, structure, naming conventions, software used)

  • Scripts and tools for accessing or analyzing data

  •  

    References to related literature

  •  

    Project purpose and context

  • Data sources and origin (e.g., from existing databases)

  • A history of changes made to the data

Consistent file naming and versioning are an important part of metadata and documentation. Clear names and version control make it easier to link files with their metadata, ensure completeness, and maintain a reliable research record.

Why it matters for you as a researcher:

  • Clarity & discoverability: Descriptive file names help you and others quickly identify what a file contains, without having to open it.
  • Reproducibility & integrity: Versioning shows how files and data evolve over time, supporting transparent and reproducible research.
  • Collaboration: Agreed naming conventions make teamwork smoother and prevent errors caused by duplicate or outdated files.
  • Documentation flow: When file names match your metadata records, it is easier to cross-reference files, track sources, and connect documentation with the underlying data.

Practical tips:

  • Use descriptive elements such as project name, data type, and date (YYYYMMDD) or version number (v01, v02).
  • Keep names independent of file location.
  • For large sets of files, use bulk renaming tools (Bulk Rename Utility, Ant Renamer).
  • For code or collaborative text, consider version control systems (e.g., Git).

A well-defined naming and versioning strategy strengthens your metadata, making your research easier to manage, share, and reuse.

Jorge cham (c) 2012
Create your own file naming conventions in this worksheet.

When research data are produced and made available, it is essential that they are accompanied by clear and structured documentation. A README file or a datasheet serves as a guide that ensures the data can be understood, used, and reused – both by other researchers and by oneself in the longer term.

A README file is a simple text file that accompanies a dataset or a project. It provides an introduction to the content, the purpose, and the main instructions on how the data can be understood and used.

A datasheet is often a more structured and detailed form of documentation. While the README file provides an overview, the datasheet goes deeper into describing the dataset’s creation, characteristics, and limitations.

Why is this important?

  • Comprehensibility: Descriptions of the dataset’s content, structure, and purpose make it easier to understand and work with the material.

  • Transparency and quality: Documentation clarifies the methods, assumptions, and limitations underlying the data.

  • Reproducibility: Enables others to validate results and apply the same approach in their own research.

  • Reuse and visibility: Well-documented data are more useful and more likely to be shared, cited, and recognized.

A README file or a datasheet does not need to be extensive, but should always include key information about the dataset’s purpose, content, variables, format, and any limitations. This is a simple investment that increases both the value and the integrity of research data.

Research data can appear in many different forms: text, numbers, databases, geodata, images, etc.

By choosing open and standardized formats, you increase the chances that both you and others will be able to access and reuse your data in the future—independent of specific software or equipment. Find and explore standards at e.g. FAIR Sharing and The Digital Curation Centre

Read more about formatting at UK Data Service.