Data Standards

What is a data standard?

A data standard is an agreed upon approach, to allow for consistent measurement, qualification or exchange of an object, process, or unit of information. Data standards include documented agreements on representation, format, definition, structure, tagging, transmission, manipulation, use, documentation, and management of data (visit the National Library of Medicine (NLM) definition and Environmental Protection Agency (EPA) definition for details).

Why use data standards?

The goal of data standards is to improve the usability of data and enable researchers to easily combine and co-analyze multiple datasets (i.e., improve interoperability).

Using data standards throughout the data lifecycle can reduce time and costs by minimizing implementation of new or customized approaches for similar research areas.

Data standards should be used and preserved across the data lifecycle, including during:

  • Collection, such as using standard forms, questions, formats
  • Harmonization and Curation, mapping to ontologies or controlled vocabularies, implementing standard workflows
  • Submission, to data repositories, for example by annotating common data elements (CDEs) in data dictionaries and providing metadata
  • Sharing, through data repositories that enable use by other researchers

What data standards are relevant to NICHD?

Please note: NICHD does not approve or disapprove data standards. Instead, the institute encourages the use of data standards suitable to the study design, the type of data collected, characteristics of the dataset, and best practices of the respective research community.
Please note that some data repositories may encourage or require use of specific data standards. Check the NICHD Data Repository Finder external link to find metadata requirements and data formats used by some NICHD-relevant data repositories. 

The following existing data standards may be relevant to researchers who receive or are applying for NICHD support. Several categories of data standards exist, and sometimes a given standard falls into several categories. Although this information will be updated regularly, it is not intended to be comprehensive.

Metadata standards specify minimal elements to describe data, and how those elements are formatted. Examples include:

When a metadata standard does not exist for a given research project, researchers can create a “readme” file using an adaptable template form Cornell Data Services’ Guide to Writing “Readme” Style Metadata external link to help ensure that others can correctly interpret the data when sharing or publishing it.

 

Controlled vocabularies, terminologies, and ontologies offer a consistent way to describe data by defining its semantic or contextual meaning. Controlled vocabularies, such as indices and subject headings, and terminologies, such as thesauri, help make data more sharable and searchable. Ontologies go a step further by describing the relationship between terms and concepts. Examples and resources include (in alphabetical order): 

In March 2024, NIH hosted a workshop on Advancing the Use and Development of CDEs in Research. external link Resources discussed at the workshop and available on the workshop website may be useful for researchers using and/or developing CDEs.

CDEs are standardized, precisely defined questions paired with a set of specific, allowable responses that are used systematically across studies to ensure consistent data collection (review also: NOT-LM-21-005: Request for Information: Use of CDEs in NIH-Funded Research). CDE resources include (in alphabetical order):  

CMDs and standard schema provide syntactic structure for how data and concepts are organized. Examples include (in alphabetical order):

These other standards and resources (in alphabetical order) may be useful to those who are applying for NICHD funding: