Good documentation ensures your data can be understood, validated, reused—and reproduced. It also helps you reflect on your workflows and align your data handling with disciplinary standards.
What you need to know about metadata
Metadata is data about data – i.e. how it was collected, in what format, and under what conditions. It supports data discovery, reuse, and long-term accessibility.
Metadata should be:
Descriptive: what the data contains
Contextual: how and why it was created
Accessible: ideally machine-readable and based on standard schemas
Some metadata is generated automatically; some must be added manually. Always follow standards relevant to your field.
HowtoFAIR.dk/metadata, DOI:10.5281/zenodo.3712064
The best time to capture metadata is during the research process, while your data are still active and fresh in context. Documenting early helps ensure accuracy and completeness, and reduces the risk of important details being lost later.
For others to be able to understand and reuse your data, documentation at the study level is important.
Study level means that the documentation describes the data in relation to the entire study or project—that is, the overall purpose, context, and design.
This differs from, for example, file level (which concerns how individual data files are structured) or variable level (which describes the individual fields/questions).
ReadMe files with key information (variables, structure, naming conventions, software used)
Scripts and tools for accessing or analyzing data
References to related literature
Project purpose and context
Data sources and origin (e.g., from existing databases)
A history of changes made to the data
Consistent
file naming and versioning are an important part of metadata and
documentation. Clear names and version control make it easier to link files
with their metadata, ensure completeness, and maintain a reliable research
record.
Why it
matters for you as a researcher:
Clarity & discoverability: Descriptive file names help
you and others quickly identify what a file contains, without having to
open it.
Reproducibility & integrity: Versioning shows how files
and data evolve over time, supporting transparent and reproducible
research.
Collaboration: Agreed naming conventions
make teamwork smoother and prevent errors caused by duplicate or outdated
files.
Documentation flow: When file names match your
metadata records, it is easier to cross-reference files, track sources,
and connect documentation with the underlying data.
Practical tips:
Use descriptive elements such
as project name, data type, and date (YYYYMMDD) or version number (v01, v02).
Keep names independent of file
location.
For large sets of files, use
bulk renaming tools (Bulk Rename Utility, Ant Renamer).
For code or collaborative text,
consider version control systems (e.g., Git).
A
well-defined naming and versioning strategy strengthens your metadata, making
your research easier to manage, share, and reuse.
Jorge cham (c) 2012
Need help getting started?
Create your own file naming conventions in this worksheet.
When research data are produced and made available, it is essential that they are accompanied by clear and structured documentation. A README file or a datasheet serves as a guide that ensures the data can be understood, used, and reused – both by other researchers and by oneself in the longer term.
A README file is a simple text file that accompanies a dataset or a project. It provides an introduction to the content, the purpose, and the main instructions on how the data can be understood and used.
A datasheet is often a more structured and detailed form of documentation. While the README file provides an overview, the datasheet goes deeper into describing the dataset’s creation, characteristics, and limitations.
Why is this important?
Comprehensibility: Descriptions of the dataset’s content, structure, and purpose make it easier to understand and work with the material.
Transparency and quality: Documentation clarifies the methods, assumptions, and limitations underlying the data.
Reproducibility: Enables others to validate results and apply the same approach in their own research.
Reuse and visibility: Well-documented data are more useful and more likely to be shared, cited, and recognized.
A README file or a datasheet does not need to be extensive, but should always include key information about the dataset’s purpose, content, variables, format, and any limitations. This is a simple investment that increases both the value and the integrity of research data.
Research data can appear in many different forms: text, numbers, databases, geodata, images, etc.
By choosing open and standardized formats, you increase the chances that both you and others will be able to access and reuse your data in the future—independent of specific software or equipment. Find and explore standards at e.g. FAIR Sharing and The Digital Curation Centre