Data Standards

What is a data standard?

A data standard is an agreed upon approach, to allow for consistent measurement, qualification or exchange of an object, process, or unit of information. Data standards include documented agreements on representation, format, definition, structure, tagging, transmission, manipulation, use, documentation, and management of data (visit the National Library of Medicine (NLM) definition and Environmental Protection Agency (EPA) definition for details).

Why use data standards?

The goal of data standards is to improve the usability of data and enable researchers to easily combine and co-analyze multiple datasets (i.e., improve interoperability).

Using data standards throughout the data lifecycle can reduce time and costs by minimizing implementation of new or customized approaches for similar research areas.

Data standards should be used and preserved across the data lifecycle, including during:

Collection, such as using standard forms, questions, formats
Harmonization and Curation, mapping to ontologies or controlled vocabularies, implementing standard workflows
Submission, to data repositories, for example by annotating common data elements (CDEs) in data dictionaries and providing metadata
Sharing, through data repositories that enable use by other researchers

What data standards are relevant to NICHD?

Please note: NICHD does not approve or disapprove data standards. Instead, the institute encourages the use of data standards suitable to the study design, the type of data collected, characteristics of the dataset, and best practices of the respective research community.
Please note that some data repositories may encourage or require use of specific data standards. Check the NICHD Data Repository Finder to find metadata requirements and data formats used by some NICHD-relevant data repositories.

The following existing data standards may be relevant to researchers who receive or are applying for NICHD support. Several categories of data standards exist, and sometimes a given standard falls into several categories. Although this information will be updated regularly, it is not intended to be comprehensive.

Metadata standards specify minimal elements to describe data, and how those elements are formatted. Examples include:

Data Governance Metadata:
- Open Digital Rights Language standardizes digital rights such as permission, prohibition, and obligation statements.
General Metadata:
- Dublin Core Metadata Initiative (DCMI) is domain agnostic, basic, and widely used.
- The Research Data Alliance maintains an extensive catalog of metadata standards .
Geospatial Information:
- International Organization for Standardization (ISO) Standard 19115
- Federal Geographic Data Committee (FGDC): Content Standard for Digital Geospatial Metadata
Helping to End Addiction Long-term^® (HEAL) Initiative: Use of HEAL Ecosystem Metadata is required for NIH HEAL Initiative® projects.
Infectious and Immune-Mediated Disease: National Institute of Allergy and Infectious Diseases (NIAID) Ecosystem Metadata Model
Medical Imaging: Medical Imaging and Data Resource Center (MIDRC)
Microarray Data: Minimum Information About a Microarray Experiment (MIAME)
Sequencing Data: Minimum Information About a Next-generation Sequencing Experiment (MINSEQE)
Social, Behavioral, Economic, and Health Sciences Data: Data Documentation Initiative (DDI) offers standards for data collected by surveys and other observational methods.
- Information about how the Inter-university Consortium for Political and Social Research is implementing DDI may also be of interest.

When a metadata standard does not exist for a given research project, researchers can create a “readme” file using an adaptable template form Cornell Data Services’ Guide to Writing “Readme” Style Metadata to help ensure that others can correctly interpret the data when sharing or publishing it.

Controlled vocabularies, terminologies, and ontologies offer a consistent way to describe data by defining its semantic or contextual meaning. Controlled vocabularies, such as indices and subject headings, and terminologies, such as thesauri, help make data more sharable and searchable. Ontologies go a step further by describing the relationship between terms and concepts. Examples and resources include (in alphabetical order):

Centers for Disease Control and Preventiion (CDC) Race and Ethnicity Code System Version 1.3 (PDF 166 KB)
Clinical Data Interchange Standards Consortium (CDISC) Controlled Terminology outlines the code lists and valid values used within CDISC-defined datasets.
Code on Dental Procedures and Nomenclature (CDT) from the American Dental Association
Common Terminology Criteria for Adverse Events (CTCAE) in Cancer Treatments
Current Procedural Terminology (CPT^®) , from the American Medical Association, offers a uniform process for coding medical and other health care services.
Data Use Ontology (DUO) is a comprehensive and community-driven effort to standardize data use conditions, specifically for research data in the biomedical domain.
Data Privacy Vocabulary (DPV) has developed a vocabulary and ontology of privacy and data protection-related terms, including a taxonomy of personal data and classification of purpose.
Disease Ontology provides consistent, reusable, and sustainable descriptions of human disease terms, phenotype characteristics, and related medical vocabularies.
Gene Ontology (GO) Resource is the world’s largest source of information on the functions of genes.
Human Phenotype Ontology (HPO) provides a standardized vocabulary of phenotypic abnormalities encountered in human disease; coordinates with Mondo Disease Ontology.
International Classification of Diseases and Related Health Problems (ICD) provides critical knowledge on the extent, causes and consequences of human disease and death worldwide.
Logical Observation Identifiers Names and Codes (LOINC) , from Regenstrief, provides an international standard for identifying health measurements, observations, and documents.
Medical Action Ontology (MAxO) is for annotating treatments and clinical management of human disease, including procedures, therapies, and interventions.
Medical Dictionary for Regulatory Activities Terminology (MedDRA) applies to all phases of drug and device development, excluding animal toxicology.
Medical Subject Headings (MeSH), a controlled vocabulary thesaurus maintained by the NIH’s NLM for indexing articles for PubMed
Mondo Disease Ontology provides a logic-based structure for unifying disease resources; integrates with HPO.
National Cancer Institute Thesaurus (NCIt)
Office of Management and Budget (OMB) Race and Ethnicity Standards
Ontology for Biomedical Investigations (OBI) defines more than 2,500 terms for scientific assays, devices, objectives, and more.
Ontology Lookup Service , from the European Molecular Biology Laboratory’s European Bioinformatics Institute, provides a single point of access to the latest ontology databases.
Ontology Resources for the Environmental Health Sciences from the National Institute of Environmental Health Sciences
SNOMED-CT , the most comprehensive clinical terminology in use around the world
RadLex from the Radiological Society of North America
RxNorm provides normalized names for clinical drugs and links to many drug vocabularies used in pharmacy management and drug interaction software.
Uber-Anatomy Ontology (Uberon) describes anatomy across species.
Unified Code for Units of Measure (UCUM) includes all units of measures being used in international science, engineering, and business fields.
- UCUM Codes for Healthcare Units lists commonly used units for clinical lab values.
Unified Medical Language System (UMLS), maintained by the NIH’s NLM, integrates many health and biomedical vocabularies and standards.
Vaccines Administered (CVX) code set, from the CDC

In March 2024, NIH hosted a workshop on Advancing the Use and Development of CDEs in Research. Resources discussed at the workshop and available on the workshop website may be useful for researchers using and/or developing CDEs.

CDEs are standardized, precisely defined questions paired with a set of specific, allowable responses that are used systematically across studies to ensure consistent data collection (review also: NOT-LM-21-005: Request for Information: Use of CDEs in NIH-Funded Research). CDE resources include (in alphabetical order):

Adult Sickle Cell Quality of Life Measurement Information System (ASCQ-Me) is a patient-reported outcome measurement system for assessing the physical, social, and emotional impact of sickle cell disease on adults
Cancer Data Standards II (caDSRII) from NCI
Disaster Research Response (DR2) Program CDEs for disaster and public health emergency research
Neuro-QoL^TM (Quality of Life in Neurological Disorders) is a measurement system that evaluates and monitors the physical, mental, and social effects experienced by adults and children living with neurological conditions.
NIH HEAL Initiative^® CDEs for pain, sleep, anxiety, and depression
NIH CDE Repository is the primary NIH repository for CDEs recommended or required by NIH institutes and centers and other organizations in research and other purposes.
- CDEs: Standardizing Data Collection (On-Demand Training)
NIH PROMIS (Patient-Reported Outcomes Measurement Information System) provides person-centered measures that evaluate and monitor physical, mental, and social health in adults and children.
NIH Toolbox is a comprehensive set of neuro-behavioral measurements that quickly assess cognitive, emotional, sensory, and motor functions.
- NIH Infant and Toddler (Baby) Toolbox offers validated measures for use in research on infants and toddlers.
PhenX Toolkit provides protocols for collecting consensus measures for phenotypes and exposures.

CMDs and standard schema provide syntactic structure for how data and concepts are organized. Examples include (in alphabetical order):

Clinical Data Interchange Standards Consortium (CDISC) Study Data Tabulation Model (SDTM) is required for data submissions to the U.S. Food and Drug Administration (FDA).
Informatics for Integrating Biology & the Bedside (I2b2) CDM Documentation helps researchers integrate medical record and clinical research data.
Observational Medical Outcomes Partnership (OMOP) from the Observational Health Data Sciences and Informatics initiative
PCORnet CDM from the National Patient-Centered Clinical Research Network (scroll to the end of the page for the CDM files)
United States Core Data for Interoperability (USCDI) for clinical research, from the Office of the National Coordinator from Health Information Technology; see also: NOT-OD-20-146: Accelerating Clinical Care and Research through the Use of the USCDI. USCDI+ provides domain or program-specific data element lists as an extension of the USCDI Common Data Model.

These other standards and resources (in alphabetical order) may be useful to those who are applying for NICHD funding:

Brain Imaging Data Structure (BIDS)
Digital Imaging and Communications in Medicine (DICOM^®) from the Medical Imaging and Technology Alliance
FAIRsharing Standards Registry (searchable)
Fast Healthcare Interoperability Resources (FHIR^®) from Health Level 7 International (HL7) for healthcare data exchange; see also: NOT-OD-19-122: FHIR^® Standard and FHIR^® for Researchers Training
Genome Reference Consortium from NLM, for human reference genome assembly
Health Information Technology and Health Data Standards (NLM)
National Drug Code (NDC) Directory for active and certified finished and unfinished drugs submitted to FDA
WHODrug Dictionary of global medicinal products and active ingredients

Breadcrumb

Data Standards

What is a data standard?

Why use data standards?

What data standards are relevant to NICHD?

Metadata Standards

Controlled Vocabularies, Terminologies, & Ontologies

Common Data Elements (CDEs)

Common Data Models (CDMs)

Other Standards and Resources