A data standard is an agreed upon approach, to allow for consistent measurement, qualification or exchange of an object, process, or unit of information. Data standards include documented agreements on representation, format, definition, structure, tagging, transmission, manipulation, use, documentation, and management of data (visit the National Library of Medicine (NLM) definition and Environmental Protection Agency (EPA) definition for details).
Why use data standards?
The goal of data standards is to improve the usability of data and enable researchers to easily combine and co-analyze multiple datasets (i.e., improve interoperability).
Using data standards throughout the data lifecycle can reduce time and costs by minimizing implementation of new or customized approaches for similar research areas.
Data standards should be used and preserved across the data lifecycle, including during:
Collection, such as using standard forms, questions, formats
Harmonization and Curation, mapping to ontologies or controlled vocabularies, implementing standard workflows
Submission, to data repositories, for example by annotating common data elements (CDEs) in data dictionaries and providing metadata
Sharing, through data repositories that enable use by other researchers
What data standards are relevant to NICHD?
Please note: NICHD does not approve or disapprove data standards. Instead, the institute encourages the use of data standards suitable to the study design, the type of data collected, characteristics of the dataset, and best practices of the respective research community. Please note that some data repositories may encourage or require use of specific data standards. Check the NICHD Data Repository Finder to find metadata requirements and data formats used by some NICHD-relevant data repositories.
The following existing data standards may be relevant to researchers who receive or are applying for NICHD support. Several categories of data standards exist, and sometimes a given standard falls into several categories. Although this information will be updated regularly, it is not intended to be comprehensive.
Metadata standards specify minimal elements to describe data, and how those elements are formatted. Examples include:
Data Governance Metadata:
Open Digital Rights Language standardizes digital rights such as permission, prohibition, and obligation statements.
Social, Behavioral, Economic, and Health Sciences Data:Data Documentation Initiative (DDI) offers standards for data collected by surveys and other observational methods.
When a metadata standard does not exist for a given research project, researchers can create a “readme” file using an adaptable template form Cornell Data Services’ Guide to Writing “Readme” Style Metadata to help ensure that others can correctly interpret the data when sharing or publishing it.
Controlled vocabularies, terminologies, and ontologies offer a consistent way to describe data by defining its semantic or contextual meaning. Controlled vocabularies, such as indices and subject headings, and terminologies, such as thesauri, help make data more sharable and searchable. Ontologies go a step further by describing the relationship between terms and concepts. Examples and resources include (in alphabetical order):
Current Procedural Terminology (CPT®), from the American Medical Association, offers a uniform process for coding medical and other health care services.
Data Use Ontology (DUO) is a comprehensive and community-driven effort to standardize data use conditions, specifically for research data in the biomedical domain.
Data Privacy Vocabulary (DPV) has developed a vocabulary and ontology of privacy and data protection-related terms, including a taxonomy of personal data and classification of purpose.
Disease Ontology provides consistent, reusable, and sustainable descriptions of human disease terms, phenotype characteristics, and related medical vocabularies.
Human Phenotype Ontology (HPO) provides a standardized vocabulary of phenotypic abnormalities encountered in human disease; coordinates with Mondo Disease Ontology.
Medical Action Ontology (MAxO) is for annotating treatments and clinical management of human disease, including procedures, therapies, and interventions.
Ontology Lookup Service, from the European Molecular Biology Laboratory’s European Bioinformatics Institute, provides a single point of access to the latest ontology databases.
In March 2024, NIH hosted a workshop on Advancing the Use and Development of CDEs in Research. Resources discussed at the workshop and available on the workshop website may be useful for researchers using and/or developing CDEs.
CDEs are standardized, precisely defined questions paired with a set of specific, allowable responses that are used systematically across studies to ensure consistent data collection (review also: NOT-LM-21-005: Request for Information: Use of CDEs in NIH-Funded Research). CDE resources include (in alphabetical order):
Neuro-QoLTM (Quality of Life in Neurological Disorders) is a measurement system that evaluates and monitors the physical, mental, and social effects experienced by adults and children living with neurological conditions.
NIH CDE Repository is the primary NIH repository for CDEs recommended or required by NIH institutes and centers and other organizations in research and other purposes.