When describing what data will be generated and shared, include details such as species (or other source), format, and amount.
These details matter because they impact whether other aspects of the DMS plan are appropriate. For example: Should there be privacy protections (only needed for human data) on the data? Is the data repository suitable? Is special software needed to access the data files? Does the NIH Genomic Data Sharing Policy apply?
ODSS continues to receive plans that are ambiguous about whether the data will come from a human or a fish. In some plans, it is also difficult to distinguish which data will be shared compared to all the data that will be generated.
Many plans indicate the researcher will share only data associated with publications. However, the DMS Policy expects that all scientific data will be shared, by the end of the award period, regardless of publication status.
The DMS plan should outline activities for sharing publication-associated data and scientific data not related to publications, unless there is a justifiable reason to limit sharing (note related issue #6).
From its inception, sharing non-publication-associated data was built into the policy’s definition of scientific data, which is tied to principles of data quality, reuse, and reproducibility regardless of publication. The concept was further emphasized by both the White House Office of Science and Technology Policy (OSTP) 2022 memorandum on Ensuring Free, Immediate, and Equitable Access to Federally Funded Research (PDF 372 KB), and the NIH Plan to Enhance Public Access to the Results of NIH-Supported Research (NOT-OD-23-091).
The NIH DMS Policy expects that researchers will share scientific data through established data repositories. Sharing data through publications, local servers, or lab websites is not the same as using a repository and does not meet the Policy’s expectations.
Identifying, learning about, and committing to the repositories’ processes and features helps you complete the rest of your plan. NICHD created a Data Repository Finder to help researchers select an appropriate repository; other NIH resources are also available.
Data repositories offer many benefits including fostering collaboration and building community. They leverage technical and scientific expertise to make data Findable, Accessible, Interoperable, and Reusable (FAIR), through use of common formats and search and visualization interfaces, and by providing continuous and refined processes for data submission, quality control, curation, harmonization, and release of multiple datasets over time.
In addition, many repositories are actively working to adhere to all of the White House OSTP’s Desirable Characteristics of Data Repositories for Federally Funded Research (PDF 373 KB), which will further improve data FAIRness, security, and the ability to track use of shared data.
ODSS’s analysis of DMS plans submitted to NICHD found a clear over-reliance on generalist data repositories, which accept any data types or formats. Although generalist repositories, can be a tempting default for sharing data, NIH DMS Policy guidance emphasizes that domain- or discipline-specific data repositories are always preferred.
Domain- and discipline-specific data repositories have services and features that are designed to serve the needs of their specific research communities, which ultimately increase the usability and overall value of shared data. Generalist repositories are useful, and NIH is investing in making them better, but you should only use them if there is not an appropriate domain-specific data repository available.
Figure 1: Repository Frequency in Element 4 of DMS Plans Submitted to NICHD
NICHD ODSS assessed the repositories and other systems researchers proposed for data sharing to help provide better guidance on repository suitability. For example, GitHub, PubMed, BioRxiv, and REDCap are not designed to serve as data repositories. In addition, many researchers are defaulting to generalist repositories (like OSF, Zenodo, Dataverse, Dryad, and Figshare) when they should be prioritizing domain-specific data repositories. The figure summarizes the systems NICHD researchers named in Element 4 of the NIH DMS plan format page, which is where researchers are prompted to describe where data will be shared.
Proposing to share data “by request” or where the PI controls access to the data is generally not acceptable under the DMS Policy. Not only does a growing body of literature demonstrate that this approach is not effective, it also inhibits the ability to make the data accessible to the larger research community (i.e., hundreds or thousands of users). So, this approach is not aligned with the Policy’s definition of data sharing.
ODSS’s analyses revealed some confusion in the community about the difference between “by request” and the NIH definition of “controlled access,” which is a privacy protection described in the DMS Policy materials. When you are concerned about research participant privacy, you should consider strategies like de-identification and using data repositories with a controlled-access option. Controlled-access repositories have established processes for verifying appropriate use of shared data, including steps like confirming requestor identity, establishing centralized data use agreements, and obtaining committee approval of secondary use requests. Because these aspects are already managed by the repository, there is no need for researchers to manage access directly or establish data use agreements on their own—often burdensome and time-consuming activities that can delay data access for years.
By using a controlled-access data repository, you are leveraging established and “well-oiled” processes and agreements that can enable data access in a matter of hours or days.
Researchers must provide a clear and thorough justification for anything that deviates from NIH DMS Policy expectations. If you are not sharing all of your scientific data, through a data repository, by the end of the award period, you need to clearly justify “why not” for NIH staff to review.
Rather than vaguely stating there are “legal reasons” or “policies,” cite the specific law or policy and explain why and how that impacts sharing your data. In some cases, you may need to invoke a specific protection because some laws or policies (like Small Business Innovation Research/Small Business Technology Transfer data rights) mean that you can delay data sharing, but you don’t necessarily have to. If you choose to invoke such a reason to delay sharing, you must explicitly state that in your plan. Similarly, rather than claiming “privacy issues” or “ethical concerns,” describe how you engaged Institutional Review Board expertise to determine reasons for limitations, such as why you will redact certain elements from shared data.
Researchers should plan to share data for as long as possible within the repository’s data retention cycles. Researchers sometimes confuse data repository retention policies with their local record retention cycles and mistakenly state they will only share data for “5 years” or whatever time frame is specified in their records management policy.
Data repository retention cycles are usually much longer and will often host data “in perpetuity.” It is not appropriate for the researcher to take data down any sooner.
NICHD wants to make sure that researchers are requesting sufficient funds to support the DMS activities described in their DMS plans throughout the life of the project, not just at the project’s end. The DMS Budget Justifications should summarize all costs relevant to activities described in the DMS plan. You should include DMS Budget Justifications even when you are not requesting DMS-specific costs, so that you can explain why funds are not needed.
ODSS expects that the majority of DMS costs will fall into the personnel categories for the time and effort required to prepare data for submission to repositories. Researchers can reduce this effort by designing data collection and analysis approaches that align with a data repository’s submission requirements (formats, metadata, documentation, etc.). Many established data repositories do not charge submission fees, or they charge a minimal one-time fee; not having to worry about long-term storage costs after data are submitted is another reason to use established repositories.