Introduction
Our goal is to derive meaningful insights from existing research on data sharing behaviors that research discovery ecosystems may apply to program evaluation. First, we provide results from a landscape review focused on common data sharing incentives and barriers. Then we summarize outcomes from key National Institutes of Health (NIH) Helping to End Addiction Long-term® (HEAL) Data Ecosystem (HDE) programs to foster a data sharing community. We define data sharing herein to reflect the National Library of Medicine’s definition: “Data sharing refers to the practice of making data available to other research stakeholders, including other investigators, research subjects, and the broader public” (Network of the National Library of Medicine, 2024). For this analysis, we focus on data and other outcomes expected to be shared by scientific researchers as part of the research lifecycle.
The NIH HEAL Data Ecosystem: Description and Composition
The HDE is part of the NIH HEAL Initiative, an NIH-wide effort to speed scientific solutions to stem the evolving national opioid public health crisis. HEAL-funded researchers share pain and opioid use disorder data in a wide range of formats, including imaging, animal studies, clinical data, and qualitative studies. HDE “seeks to accelerate sharing HEAL-generated data and results among the broad community of researchers, health care providers, community leaders, policy makers, and other HEAL stakeholders who can benefit from learning initiative research results” (NIH HEAL Initiative, 2021). HDE connects the HEAL community, enabling dataset search (via HEAL Data Platform and Semantic Search), analysis, and reuse for new discoveries.
HEAL funding empowers “researchers to make their HEAL-generated data FAIR (findable, accessible, interoperable, and reusable)” and promotes data sharing (NIH HEAL Initiative, 2021). The HEAL Data Platform includes a search and discovery interface powered by rich metadata and secure, cloud-based workspaces. The platform does not store data but instead interoperates with the individual HEAL-compliant repositories in which HEAL data are deposited, providing secure access to datasets under the corresponding repository’s access restrictions and approval processes. Researchers may also take advantage of HDE tools such as variable-level metadata submission tools.
The HEAL Data Stewardship Group (HEAL Stewards) includes staff members from RTI International and the Renaissance Computing Institute (RENCI) at the University of North Carolina at Chapel Hill and helps facilitate HDE. The HEAL Stewards develop researcher-facing outreach programming, including:
-
organizing, leading, and maintaining HDE governance structures, including the Collective Board;
-
leading outreach and engagement strategy development and implementation, including webinars, workshops, consulting, and other community member training;
-
leading HEAL Semantic Search integration with the HEAL Data Platform;
-
providing guidance for selecting a repository and submitting datasets and metadata; and
-
developing documentation and supporting materials for interacting with HEAL Semantic Search.
The HEAL Collective Board guides HDE strategy and direction to develop methods and norms, cultivate a culture of sharing, and maximize collaboration (NIH HEAL Initiative Data Stewardship Group, n.d.).
Background and Methods
Open, accessible research data provides a foundation for scientific discovery. During the COVID-19 pandemic, data sharing across disciplines hastened vaccine development: almost half of researchers working on vaccine research (43 percent) shared data openly (Druedahl et al., 2021). Recent federal policies (in the United States and beyond) mandate data sharing plan submission to expand access to research outcomes. In 2023, the NIH implemented a revised Data Management and Sharing Policy (NOT-OD-21–013: Final NIH Policy for Data Management and Sharing; NIH, n.d.). The National Science Foundation, the National Endowment for the Humanities, and several other federal granting agencies also now require that proposals include data management plans in grant applications.
Several researchers have examined the factors behind data sharing behaviors. For example, according to Late et al. (2024), “Supporting the scientific community, the open science agenda and fulfilling research funders’ requirements motivate scholars to share their data. Impeding factors relate to the qualities of data, ownership of data, data stewardship, and research integrity” (p. 386).
Data sharing hesitancy has repercussions for open science. Pearson (2003) argued that limited result sharing can lead to a decline in the open exchange of ideas, hindering scientific progress. Delayed data sharing impedes progress in health care research, potentially resulting in increased costs (Vickers, 2006). Research on data sharing behavior among scientists indicates that hesitancy may be rooted in several factors, including a lack of certainty about how, when, and where to share data and a competitive research culture in which high-impact publications drive career advancement. In addition, many researchers worry their discoveries will be “scooped,” possibly resulting in loss of credit. Conversely, factors that incentivize data sharing include receiving full credit for their findings, adequate training in open science practices, and fostering a collaborative research culture.
Ensuring HEAL-funded researchers share research outcomes is a critical objective for the HEAL Stewards. Conducting a landscape analysis serves the HEAL Stewards’ efforts to connect researcher-facing programs and activities with evidence-based practice. Our approach to the literature review involved first identifying existing research studies that address data sharing factors in two general categories: (1) barriers/disincentives and (2) benefits/incentives. In compiling a list of publications related to data sharing, we first gathered articles in a range of research disciplines to explore how scholars currently define key data sharing barriers and incentives. The initial database search for existing literature on data sharing benefits and barriers included a broad spectrum of research disciplines and data types; however, the final set of references samples more intensively from health, biomedical, and social sciences research. Biomedical data sharing likely differs from that in nonmedical fields; however, the scope of this review did not include an intentional differentiation of factors between fields. The literature search was designed to identify how scholars have described the most common barriers and incentives across a range of disciplines.
While not generalizable to all research fields, the factors identified in the landscape analysis cluster around common incentives and barriers that we anticipate will be helpful to HDE staff planning new or evaluating existing programming. The analysis and recommendations described herein aim to support an informed evaluation of how well HDE activities align with current best practice and where there may be room for improvement and expansion. Most HDE activities were launched before the literature review was conducted; therefore, the evidence-based factors we identify here are primarily intended to be informative. That is, the analysis will support developing key metrics to assess HEAL Stewards’ outreach efforts.
Defining the Landscape: Data Sharing Incentives and Barriers
Barriers to Data Sharing
Despite clear benefits and recent technological advances that help streamline the process, data sharing remains stubbornly low in the sciences (Houtkoop et al., 2018; Pearson, 2003; Vines et al., 2013). A survey by Hipsley and Sherratt (2019) found that only 14 percent of investigators shared biological imaging data. Low data sharing rates occur even in federally funded research, suggesting that barriers may exist beyond a lack of awareness of how and why to share data. Researchers may encounter any number of barriers, including legal and ethical restrictions, time constraints, a lack of incentives, and the fear that sharing their data may result in scooping or exploitation (Hipsley & Sherratt, 2019; Houtkoop et al., 2018; Pearson, 2003; Tenopir et al., 2011). Legal and ethical restrictions may limit sharing certain types of data that could identify participants or endanger rare species if released (Duke & Porter, 2013; Pearson, 2003). Technical issues, such as limited storage options for large amounts of data, may pose logistics challenges, although improved infrastructure and software have begun to mitigate this challenge (Farley et al., 2018; Stephens et al., 2015).
Fear of Being Scooped, Career Advancement, and Citations
The fear of scooping—the idea that other researchers may exploit findings if results are shared prematurely—appears frequently in literature on barriers (Hipsley & Sherratt, 2019; Houtkoop et al., 2018; Pearson, 2003).
Concerns about loss of publication opportunities—which are critical for building academic reputation, applying for tenure, and securing grants—serve as disincentives to sharing (Callaway, 2019; Walsh & Hong, 2003). Publications represent costs in terms of time and funding; researchers who hesitate to share may view data as a proprietary resource (Barczak et al., 2022). Having research ideas scooped may threaten a researcher’s ownership over their work (Callaway, 2019) or may damage an early career academic’s reputation (Teixeira da Silva & Dobránszki, 2015); lack of attribution also causes some researchers concern (Devriendt et al., 2021).
In one survey of cell biologists, over 75 percent reported fear of getting scooped. Anxieties are heightened in rapidly moving fields like cell and molecular biology, where experiments can be designed, executed, and published within weeks (Pearson, 2003). Additionally, online data repositories, preprint servers, and electronic journal submissions may enable competitors to generate early manuscript versions and publish results ahead of the original researcher (Teixeira da Silva & Dobránszki, 2015). In a “winner-takes-all” culture, where reputation and careers hinge on high-profile, first-author publications, this sense of competitiveness (Barczak et al., 2022) exacerbates data sharing hesitancy. Some researchers respond by limiting prepublication communications altogether (Adams et al., 2018; Walsh & Hong, 2003) or by delaying data sharing to secure the first opportunity to present their findings (Hulsen, 2020; Mozersky et al., 2021).
In addition to fears of exploitation or scooping, researchers express concern about the need to prioritize their career advancement, which in scientific fields depends heavily on publishing. For early career researchers, who may struggle to receive credit for their research contributions (Hardy, 2021; Hutchings et al., 2020), a perceived lack of credit may foster data sharing hesitancy. Soeharjono and Roche (2021) noted that researchers “report [more] benefits (47.9%) and neutral outcomes (43.6%) than costs (21.4%) from openly sharing data…[but] early career researchers were more likely to report costs” (p. 750). Career advancement opportunities tended to be less abundant for early career researchers (Hutchings et al., 2020). Hutchings et al. (2020) propose “a shift away from the traditional criteria of academic promotion, which includes research outputs, to one which is inclusive of a researcher’s data sharing history and the availability of their research dataset for secondary analysis” (p. 26).
Collaborating on publications supports younger academics’ advancement; however, efforts to circumnavigate hesitancy by co-authoring face challenges. Melbourne researcher Josh Hardy (2021) recounts, from an effort to co-author research publications with overlapping studies, “Rather than being redundant, our experiments had validated each other’s finding in different viruses and strengthened the result of both experiments. However, coordinating publications is not always straightforward. Many journals do not have clear mechanisms for co-submission and do not sufficiently support the model” (p. 2). Hardy argues that if early career researchers are to engage in collaborative research, “more scientific journals need to support and have guidelines for reviewing and accepting joint submissions” (Hardy, 2021, p. 3). In addition to transforming publication models, academic culture should reward researchers for engaging in research collaborations (Hutchings et al., 2020).
Strategies for measuring research impact also drive data sharing behaviors. Citation metrics, for example, tend to define a researcher’s scientific stature. A 2019 representative sample of United States and Canadian institutions found that 40 percent of the research-intensive institutions had impact factor language in retention, promotion, and tenure package documentation (McKiernan et al., 2019).
Data Equity and Access
Data sharing can introduce data equity challenges, further exacerbating hesitancy. Common data equity concerns relate to sensitive data handling and information access. Finding a balance between open and accessible data sharing and privacy/sensitivity concerns remains a challenge (Sardanelli et al., 2018; Vickers, 2006). Addressing researcher, patient, and community concerns is critical to data sharing, particularly as patients and/or research participants are increasingly recognized as the rightful owners of their data (Hulsen, 2020; Vickers, 2006). Regulations governing health data privacy, including the Health Insurance Portability and Accountability Act, constrain open data sharing. Survey findings suggest overcoming sensitive data barriers may require articulating explicit norms, incentives, Institutional Review Board processes, and levels of trust around open data (Hipsley & Sherratt, 2019; Houtkoop et al., 2018).
Other data considerations include creating equitable policies governing appropriate data sharing, particularly with respect to low-resource communities. Clear agreements and effective sensitive data policies help ensure responsible and ethical data sharing (Hulsen, 2020; Vickers, 2006). Pratt and Bull (2021) highlighted five data sharing barriers in low-resource communities, including (1) lack of infrastructure and technology necessary to use and analyze data; (2) lack of research credit for data reuse; (3) inaccessible research outcomes (publications, presentations, and data); (4) population-specific stigmas; and (5) other adverse consequences to communities.
Incentives for Data Sharing
Research on data sharing highlights the many benefits to fostering an open data sharing culture. Sharing research data creates opportunities for collaboration and knowledge-building (Adams et al., 2018; Barczak et al., 2022) and supports reproducibility and reuse (Berman et al., 2015; Houtkoop et al., 2018; Wilkinson et al., 2016). Secondary data analysis often leads to cross-disciplinary discoveries (Reichstein et al., 2019; Stephens et al., 2015). Data sharing accelerates public health research on topics such as disease outbreaks and climate change (Sarabipour et al., 2019; Tse et al., 2020). Moderating the often-competitive research culture, supporting researchers’ career advancement, providing credit for research contributions, and optimizing publication/citation opportunities are some of the most common themes that the literature on incentives for data sharing addresses.
Open Research Culture, Career Advancement, and Citations
One of the primary drivers of sharing behavior is funding agency mandates. Federal policy now requires data sharing for publicly funded research. Many international publishers have also adopted policies requiring funded researchers to provide data access as soon as possible (Barczak et al., 2022; Chawinga & Zinn, 2019). Given appropriate support, researchers tend to share data more willingly.
In addition to policy-driven sharing, researchers choose to share data for various reasons. Barczak et al. (2022) observed that researchers recognize community benefits from sharing (mutual support and collaboration). In fact, collaboration is often a silver lining to sharing data, despite researcher misgivings. Sharing may thus be perceived as a counterweight to the fear of scooping. The sharing process and a shared commitment to open science practices within a research community or discipline help limit disincentives. Laine (2017) reported on a project in which a culture of open data encouraged traditional competitors to collaborate and “focus their projects on different research themes to avoid direct competition” (p. 6). Melero and Navarro-Molina (2020) highlighted the promise of increased citations as one positive outcome, but beyond these direct benefits, the concept of openness as a moral/ethical good is also a cultural factor that supports data sharing (Lounsbury et al., 2021).
Data sharing incentives include training (Houtkoop et al., 2018); funding that covers repository fees/costs; credit in the form of citations (Melero & Navarro‐Molina, 2020); and a clear process for sharing data (Hipsley & Sherratt, 2019). The promise of increased citations convinces some researchers to make datasets available (Curty et al., 2016; Gomes et al., 2022); however, researchers must understand where and how to share data. Devriendt and colleagues (2021) identify incentives that need to be present for researchers to feel comfortable sharing data, including credit/recognition, transparency, reciprocity, and trust. Hipsley and Sherratt (2019) explored key drivers and reported that financial rewards in any form increase data sharing behavior. Soeharjono and Roche (2021) examined both barriers and incentives, reporting that researchers interviewed tended to experience a sense of personal reward after sharing, although this is less a tangible incentive than a general benefit. Soeharjono and Roche also found that career benefits (advancement and stature) may serve as incentives.
Research Access, Efficiency, and Impacts
Zuiderwijk and colleagues (2020) examine key factors incentivizing data sharing, finding in a broad literature review that incentives depend on researcher background (discipline), but formal data access requirements/policies, such as data sharing mandates, serve as a key driver. In addition, automatic dataset publication (research efficiency) and institutional financial support help improve data sharing rates. A wide range of personal incentives also drive sharing, including researcher commitments to (1) reproducibility, (2) a culture of sharing, (3) advancing research in their field (research impact), and (4) validating results. Zuiderwijk and colleagues found that favorable conditions for sharing also include access to the following resources: appropriate research data repositories; shorter embargo periods; minimal risk to participant privacy; rewards and recognition for publication and data sharing; increased citations; social influence; more research collaborations; experience/skills in sharing; and using data types that support sharing (are easy to convert to open formats).
Although less comprehensive than Zuiderwijk and colleagues’ literature review, Laine’s (2017) broad exploration of data sharing incentives confirms increased citations and publications benefits. Similarly, Woods and Pinfield (2021), in a literature review, categorized key data sharing incentives thematically, including:
the need to build on existing cultures and practices, meeting people where they are and tailoring interventions to support them; the importance of publicizing and explaining the policy/service widely; the need to have disciplinary data champions to model good practice and drive cultural change; the requirement to resource interventions properly; and the imperative to provide robust technical infrastructure and protocols, such as labeling of data sets, use of DOIs [digital object identifiers], data standards and use of data repositories. (p. 1)
Literature Summary
Table 1 summarizes the frequently mentioned data sharing barriers and the incentives that may help mitigate these barriers.
Fostering a Culture of Data Sharing in the HEAL Data Ecosystem Through Engagement and Outreach
Overview of Data Sharing in HDE
HDE’s design is informed by a distributed data system model. Distributed systems vary widely in their implementation but tend to include differentiated governance and geographically dispersed infrastructure components. HDE serves as a centralized metadata catalog, providing users tools to discover and easily access HEAL-funded data. As a distributed data ecosystem, the HDE operation relies on HEAL-compliant digital repositories for study dataset storage. Researchers preparing to deposit data may select from an abbreviated list of prevetted repositories. Researchers generally have some flexibility to choose the repository that best suits their data. The HEAL Data Platform aggregates metadata from HEAL studies and serves as a central discovery portal. Figure 1 illustrates the HDE’s primary components, which include HEAL-supported researchers, data repositories, and community stakeholders.
HDE promotes data sharing, setting research-data producer expectations to share data with the ecosystem. Each HEAL-funded study is expected to submit study-level and variable-level metadata. Other HEAL-specific implementation steps include:
-
Register on the HEAL Data Platform and submit necessary metadata.
-
Use HEAL common data elements.
-
Use broad consent language.
-
Indicate the planned HEAL-compliant repository.
HDE supports activities that foster a sense of community around data sharing. These activities are discussed in more detail in subsequent sections.
HDE Outreach and Engagement Activities
As of fall 2024, approximately 20 percent of HEAL-funded studies have selected a repository. As HDE has evolved, various stakeholders have identified factors that tend to promote or inhibit participation in data sharing activities. In addition to the factors identified by scholars (described in the previous sections), HEAL Stewards have identified common researcher questions that, when addressed, help foster HDE-wide data sharing participation:
Questions related to why to share data:
-
Does my study type fit within the NIH HEAL Initiative’s sharing requirements?
-
If I am working on a study that does not require data sharing, are there ways to participate in the HEAL processes to increase transparency in my work?
Questions related to how to share data:
-
How do I comply with the 2023 NIH Data Management and Sharing Policy?
-
What are the FAIR principles? And how do they affect my data sharing protocols?
-
How do I use required common data elements and other metadata standards?
-
How do I create research documentation, such as README.txt files?
Questions related to where to share data:
-
Would a generalist or specialized repository be more suitable for my data?
-
Which repositories specialize in my study’s data type?
Questions related to when to share data:
-
When in the research lifecycle should I select a repository?
-
Should I share data before my study/grant has ended?
To help researchers navigate these concerns and move toward successfully sharing data, the HDE has developed a suite of services and tools to connect researchers with just-in-time resources for overcoming barriers and enhancing incentives. HDE services include a wide range of in-person and asynchronous support, from direct assistance with selecting a HEAL-compliant repository to webinars on navigating sensitive data. HEAL Stewards encourage study teams to implement FAIR principles (devised by Wilkinson et al., 2016) in their data management strategies. HEAL Stewards’ webinars and consulting activities aim to address researcher concerns and questions, explain how to participate effectively in HDE, connect researchers to the optimum repositories for their study, and provide efficient data sharing guidance. The HEAL Collective Board, comprising more than 20 active HEAL-funded investigators, meets regularly to advise HEAL Stewards and help promote a culture of data sharing throughout the HEAL research community.
In Table 2, we list HEAL Stewards’ outreach and support services to date, identify the barriers or incentives to which they most directly correspond, and provide an assessment of the programs’ effectiveness in addressing the barriers.
Results and Recommendations
The landscape review will help inform HDE’s ongoing efforts to address data sharing challenges. Programs implemented before the review have generated both positive outcomes and areas for improved alignment with researcher needs. Understanding where researchers face challenges will contribute to HDE refinements and expansion. We recommend the following strategies to fine-tune HDE’s alignment with the key factors that support a data sharing culture:
-
HDE should continue to build on early successes. For example, Fresh FAIR webinars, one-to-one consultations, and ongoing outreach programming have resulted in demonstrated increases to HDE participation. Evaluating efforts considering common researcher barriers and incentives helps program administrators understand the nuances of data sharing behavior. Much work remains to foster a sense of community with respect to data sharing. Additional planned efforts focused on (1) identifying potential dataset-associated publications and (2) targeting PI engagement to support HEAL-funded study teams and address data sharing challenges.
-
The HEAL Stewards and NIH HEAL Initiative leadership should continue to refine HDE programs and services in response to Collective Board and specific researcher input about their unique barriers. In particular, the team recognizes the challenge of tracking research outcomes over time and the lack of tracking mechanisms linking publications with data management plans.
-
Future programming should build on lessons learned through engagement activities and landscape analyses (including data asset inventories), which point to a continued need for outreach, online resources/guidance, and instructional programming. Researchers at all levels, particularly new researchers, benefit from data management support. In addition to consulting services, fostering community-wide connections and ensuring researchers are aware of existing resources at institutions will help study teams cross the data sharing finish line and cultivate a vibrant and collaborative research community.
Conclusion
One of HDE’s core objectives is to implement evidence-driven strategies for building a culture of research data sharing and collaborative discovery. Existing literature helps provide a foundation for HDE system growth and refinement to meet researchers’ needs and address their challenges; however, additional data drawn from HEAL researcher feedback would help the HEAL Stewards fine-tune programming to meet specific needs and address challenges. Although substantial challenges to improving data sharing participation rates remain, the HEAL Stewards’ outreach and engagement activities have been foundational in addressing some of the common barriers to sharing and fostering a collaborative research culture throughout the HEAL community. Much work remains to be done to align HDE programs fully with gaps in researchers’ capacity to meet data sharing expectations. Exploring research evidence around data sharing behaviors—including the most common barriers to and incentives for sharing data—supports outreach programming and helps address researcher concerns. Practical guidance and services that address known barriers and provide targeted participation incentives are essential to foster researchers’ ability and willingness to share data and digital assets.
Data Availability Statement
In this publication, we do not report on, analyze, or generate any data.
Generative AI Use
We confirm that we did not use generative AI tools/services to author this submission.
Acknowledgments
Authors wish to acknowledge the National Institutes of Health Helping to End Addiction Long-term® (HEAL) Initiative, which provides funding for the HEAL Data Ecosystem.
RTI Press Associate Editor: Janelle Armstrong-Brown