UW Research

Genomic Data Sharing

Print Print


Researchers that plan to submit genomic and linked phenotypic data to NIH-designated repositories must obtain institutional certification that data submission plans are consistent with NIH policies. The UW IRB is responsible for reviewing researchers’ genomic data sharing plans and consent forms to verify that NIH certification requirements have been met.

Detailed information can be found in this guidance and in the support materials listed below.

Purpose and Applicability

This webpage provides guidance to IRB members, HSD staff, and researchers about the review of: (1) research involving plans for sharing genomic data with NIH-designated repositories; and (2) requests for certification of the data.

Overall Considerations for Institutional Certification

The institutional certification should state whether the data will be submitted to an unrestricted or controlled-access database.

The institutional certification should assure:

  • Consistency with applicable law, regulation, policy. The data submission is consistent, as appropriate, with applicable national, tribal, and state laws and regulations as well as relevant institutional policies. (details below)
  • Data limitations are delineated. Any limitations on research use of the data, as expressed in the informed consent documents are delineated. (details below)
  • Data is confidential. The identities of research participants will not be disclosed to NIH-designated repositories. (details below)
  • IRB review. An IRB, privacy board, and/or equivalent body, as applicable, has reviewed the investigator’s proposal for data submission and assures that:
    • The protocol for the collection of genomic and phenotypic data is consistent with 45 CFR 46. (details below)
    • Data submission and subsequent data sharing for research purposes are consistent with the informed consent of the study participants from whom the data were obtained. (details below)
    • Consideration was given to risks to individual participants and their families associated with data submitted to NIH-designated repositories and subsequent sharing. (details below)
    • To the extent possible, consideration was given to risks to groups or populations associated with submitting data to NIH-designated repositories and subsequent sharing. (details below)
    • The investigator’s plan for de-identifying data sets is consistent with the standards outlined in the NIH GDS policy. (details below)

Consistency with Applicable Laws and Policies

HSD staff identify applicable laws and policies as they would for review of any application (GUIDANCE Human Subjects Regulations; WORKSHEET Pre-Review).

Applicable policies include HSD SOPs and may include UW Privacy Policies.

Applicable laws frequently include HHS human subjects protections regulations (45 CFR 46), FDA human subjects protection regulations (21 CFR Parts 50 and 56), and the Health Insurance Portability and Accountability Act Privacy Rule (45 CFR Part 160 and Part 164, Subparts A and E).

Data submission must also be consistent with applicable tribal laws when the data are from American Indian and Alaska Native peoples. For example, tribal nations have jurisdiction over research conducted on tribal lands with tribal citizens. In general, the IRB relies on the researcher to provide relevant information about tribal laws as requested in the IRB Protocol form.

Data Collection is Consistent with 45 CFR 46

Data collection procedures must be consistent with HHS human subjects protections regulations.

  • Prospectively collected data. This is verified either by the UW IRB review process or by relying on an external IRB for review.
  • Retrospectively collected data. This is verified by confirming that data were collected with IRB approval.

Consideration of Risks

Risk Assessment

The IRB considers the risks associated with the genomic information in the event of re-identification and disclosure. The IRB considers ways to minimize those risks within the context of the expected benefits of broad sharing.

The IRB also considers the extent to which genomic information associated with the participant could be used to identify an individual, or their family, by matching data sets to other sources of information.

UW considers the sharing of genomic data through NIH-designated repositories to involve minimal risk provided the criteria listed below are met. It is important to note that sharing of genomic information through NIH repositories that does not meet these criteria is not inherently more than minimal risk.

  • The expectations of the NIH GDS Policy or GWAS Policy are met;
  • There is not a high risk of re-identification; and
  • Results form secondary research using NIH data will not be returned to subjects

Risks of Re-identification

Currently, NIH-designated repositories that share genomic data do not meet the definition of human subjects research under HHS regulations at 45 CFR 46 because the data submitted to the repositories are collected solely for other research studies, and because the data are coded and the identity of the individuals from whom the data were obtained will not be readily ascertainable to the investigators maintaining the repository.

NIH notes that this review and certification process goes beyond the requirements of 45 CFR 46. However, NIH has implemented these policy requirements due to concerns that the evolution of genomic technology and analytical methods could increase the risk of re-identification and consequently risks associated with inadvertent or inappropriate use or disclosure.

Technologies available within the public domain today, and expected technological advances, make the identification of specific individuals from their genomic information increasingly straightforward.

The number of DNA markers, such as single nucleotide polymorphism (SNPs), that are needed to uniquely identify an individual is small. Data can be used with high certitude to confirm that two samples come from the same person. Nevertheless, the ease of identifying people from genomic data should not be overstated. This cannot be done without reference data and a high degree of expertise

Examples of populations that may be at a higher risk of re-identification include:

  • Geographically defined communities;
  • Members of ultra-rare disease groups;
  • Individuals who have engaged in illegal behavior (see below)

Risks Associated with the Freedom of Information Act (FOIA)

NIH-designated repositories are U.S. government records that are subject to the Freedom of Information Act. NIH is required to release government records unless the records are exempt from release under one of the FOIA exemptions.

NIH believes the release of certain information to be an unreasonable invasion of privacy under FOIA exemption 6, 5 U.S.C. §552 (b)(6). Therefore, NIH foresees preserving the privacy of research participants and the confidentiality of genetic information by, for example, redacting individual-level genotype and phenotype data from any disclosures made in response to FOIA requests and the denial of unredacted requests.

Risks associated with Law Enforcement

Although NIH-repositories hold only coded data, it is conceivable that law enforcement agencies could ask for genomic information from the repositories, and, for example, search for matches to DNA for forensic purposes. Law enforcement might seek to compel disclosure of identifying information from the institution holding the identifying information.

Release of identifiable information may be protected from compelled disclosure if a Certificate of Confidentiality is or was obtained for the original study. See GUIDANCE Certificate of Confidentiality.

Potential Harms to Individuals, Family Members, Specific Populations, Groups, and Communities

Harms that result from inappropriate use or disclosure of genomic data may include denial of employment or insurance.

The Genetic Information and Non-discrimination Action of 2008 (GINA) provides a baseline level of protection against genetic discrimination in the United States.

  • GINA is a federal law that prohibits discrimination in health coverage and employment based on genetic information.
  • GINA does not protect against discrimination in the context of life insurance, disability insurance, or long-term care insurance. GINA’s protections apply to “asymptomatic” individuals, not those who have manifested disease.

Harms may also include psychosocial harms such as stress, anxiety, stigmatization, or embarrassment resulting from disclosure of information about family relationships, ethnic heritage, or potentially stigmatizing conditions.

Research has shown that some populations demonstrate a higher predisposition to developing certain diseases or disorders than others. Genetic variants associated with physical disorders, diseases, and behavioral traits and causative variants will be found in all populations with differing frequencies. Higher or lower frequencies that contribute to observed health patterns, particularly those that can be viewed negatively, can lead to genetic stereotypes and stigmatization of a population group.

Return of Individual Research Results

Return of individual research results to participants from research using data shared through NIH-repositories is expected to be an extremely rare occurrence. Nonetheless, the return of results must be carefully considered because the information can have a psychological impact (i.e. stress and anxiety) as well as implications for the participant’s health and well-being. While clinically valid and meaningful results can have a positive impact on an individual’s health, harms can occur if un-validated research results are provided back to participants or used for medical decision-making.

Secondary investigators will not be able to return results directly to participants because they will not have access to the identities of these individuals. If a secondary investigator does generate clinically valid results of immediate clinical significance, they can only facilitate their return by contacting the contributing investigator who holds the key (if still maintained) to the code that identifies participants.

When links to identifying information are retained, individual participants should be given the option of choosing or declining to receive results. If participants are given the option of receiving results, researchers should be aware that results may be returned years after they have submitted the study data to NIH.

De-identification of Data is Consistent with GDS Policy

De-identification Requirements:

    • Data submitted to NIH-designated repositories must be de-identified and coded using a random, unique code.
    • The 18 identifiers enumerated in the HIPAA Privacy Rule and in the SUPPLEMENT Genomic Data Sharing must be removed.
    • Data should be de-identified such that the identities of the individuals from whom the data were collected cannot be readily ascertained or otherwise associated with the data by the NIH repository staff or secondary data users.

Informed Consent

Consent Requirements and Expectations for Genomic Data Sharing

Use the worksheet Genomic Data Sharing Certification to identify the applicable consent requirements for genomic data sharing and to determine whether the requirements are met. If the consent requirements cannot be met, the data will not qualify for GDS certification.

  • When it is anticipated that GDS certification requirements cannot be met: investigators should state this in their DMS Plan and indicate what data, if any, can be shared and how. In some instances, the funding NIH Institute, Center, or Office (ICO) may need to determine whether to grant an exception to the data submission expectation under the GDS policy.
  • When genomic data from specimens created or collected after January 25, 2015 (effective date of GDS policy), lack consent for research use and data sharing: if there are compelling scientific reasons that necessitate the use of the genomic data, investigators should provide a justification in the funding request for their use. The funding NIH funding institute or center will review the justification and decide whether to make an exception to the consent requirement.
  • When the research is funded or supported by NHGRI and consent expectations cannot be met: NHGRI will grant exceptions to the consent expectation on a case-by-case basis. Information about how to request an exception can be found in the NHGRI GDS Policy FAQs.

Studies Involving Minors

If the study involves children, the IRB must consider the appropriateness of the continued maintenance and sharing of the data when the child reaches the legal age of consent.

In particular, it is important to consider whether consent should be obtained from the now-adult subject. When a link to identifiers is maintained, researchers must provide the subject with the opportunity to withdraw data from the NIH-repositories, unless the IRB approves a waiver of the consent requirement for the now-adult subjects. See GUIDANCE Consent Protected and Vulnerable Populations for information about consent waivers.

Studies Involving Consent by Legally Authorized Representative (LAR)

If the study proposes to obtain consent from legally authorized representatives, the IRB must consider the issues related to LAR consent as described in GUIDANCE Consent Diminished or Fluctuating Consent Capacity and Legally Authorized Representative (LAR).

In particular, it is important to consider reconsent of subjects who regain the capacity to consent for themselves. When a link to identifiers is maintained, researchers must obtain consent from the subjects who regain the capacity to consent and provide the subject with the opportunity to withdraw data from the NIH-repositories unless the IRB approves a waiver of the consent requirement.

Data Use Limitations

Consistency With Informed Consent

Through the Controlled Access process for providing data access to secondary users, mechanisms are in place to minimize the likelihood of usage of genomic data in ways that are inconsistent with the original informed consent. The IRB is expected to: (1) have reviewed all proposed submissions of data to NIH-designated repositories to ensure that the submission and subsequent sharing for research purposes are consistent with the informed consent of the study participants; (2) certify the appropriate research uses of the data; and (3) identify the specific data use limitations.

The IRB accomplishes this by reviewing the terms of the consent form and documenting any limitations to use of the data, as expressed in the consent form, in the Institutional Certification.

For example, if the consent form includes the possibility of data sharing but states that the data will only be used for the study or a particular disease, a disease specific data use limitation should be documented in the Institutional Certification unless subjects are re-consented for broader use of the data.

Four Main Categories of Limitations. (see NIH reference)

  • General research use. Data can be used for any research purpose but would not be made available for non-research purposes. These data would generally be made available to any qualified investigator.
  • Health/medical/biomedical. Use of these data is limited to a focus on health/biomedical research objectives, excluding the study of population origins or ancestry. These data would generally be made available to any qualified investigator.
  • Disease-specific. Data can only be used for research on a specific disease or a related condition. When informed consent documents allow the data to be used for future studies related only to a particular disease (e.g., diabetes and related conditions), a disease-specific limitation would be appropriate.
  • Other. These are data use limitations that are not included in the standard NIH categories that are specified by the certifying institution.

Modifiers to the Main Categories. The following limitations are modifiers of the four main categories:

  • Genetic studies only. Data can be used only for genetic studies. These may include research on the role of genetics in any disease, condition, or non-disease trait. These may also include research that could have implications for understanding ancestral history because of the information that it may provide about allele frequencies in different populations.
  • Methods. Data can be used for statistical methods research and development (e.g., development of statistical software or algorithms).
  • Not-for-profit use only. Data can be used only for not-for-profit organizations. If the data should not be made available to commercial entities, this restriction should be stated specifically as a data use limitation.
  • Publication required. Data can be used only if the secondary investigator will disseminate the study findings to the larger scientific community.
  • IRB approval required. Data can be used only with IRB approval from the secondary investigator’s institution. Documentation of local IRB approval, including a description of the type of review (e.g., full committee, expedited), would be submitted as part of the data access request.
  • Collaboration required. Data can only be used for collaborative research with the primary study investigator(s). A letter of collaboration must be submitted with the request for data.


This section provides definitions for key Genomic Data Sharing concepts, as described in NIH Policies.

Coded: Any identifying information (such as name) that would enable the investigator to readily ascertain the identity of the individual to whom the private information or specimens pertain has been replaced with a number, letter, symbol, or combination thereof (i.e., the code) and a key to decipher the code exists, enabling linkage of the identifying information to the private information or specimens.

Controlled-access: Data are available to an investigator for a specific project only if certain stipulations are met.

dbGaP (database of Genotypes and Phenotypes): A central data repository at the National Center for Biotechnology Information (NCBI), a branch of the National Library of Medicine.

De-identified data: Note that this definition is specific to NIH’s Genomic Data Sharing policy. Data that has been de-identified according to the following criteria: the identifiers of data subjects cannot be readily ascertained or otherwise associated with the data by the repository staff or secondary data users (45 CFR46.102(f)); the 18 identifiers enumerated at 45 CFR 164.514(b)(2) (the HIPAA Privacy Rule) are removed; and the submitting institution has no actual knowledge that the remaining information could be used alone or in combination with other information to identify the subject of the data.

Large-scale genomic data: The GDS Policy applies to all NIH-funded research that generates large-scale human or non-human genomic data as well as use of these data for subsequent research. Large-scale data include genome-wide association studies (GWAS), single nucleotide polymorphisms (SNP) arrays, and genome sequence, transcriptomic, metagenomics, epigenomic, and gene expression data. Examples are included below. See Supplemental Information to the NIH Genomic Data Sharing Policy for more examples.

  • Sequence data from more than one gene or region of comparable size in the genomes of more than 1,000 human research participants
  • Sequence data from more than 100 genes or region of comparable size in the genomes of more than 100 human research participants
  • Sequence data from more than 100 isolates from infectious organisms

NIH GWAS Data Repository: Also known as the “Database of Genotype and Phenotype (dbGaP)”, the NIH GWAS Data Repository is a database developed by the National Center for Biotechnology Information (a division of the National Library of Medicine) to archive and distribute the results of studies that have been investigated.

NIH-designated repository: Any data repository maintained or supported by NIH either directly or through collaboration.

Unrestricted-access: Data are accessible to anyone via public website (previously referred to as “open access”).

UW IO: A Senior Official at the institution who is credentialed through NIH eRA Commons system and is authorized to enter the institution into a legally binding contract and sign on behalf of an investigator who has submitted data or a data access request to NIH. The UW Institutional Official who has the authority to provide institutional certification for data sharing under the GWAS and GDS Policies is the Grant and Contract Administrator processing the award.

Related Materials

GUIDANCE Certificate of Confidentiality
GUIDANCE Human Subjects Regulations
SOP Genomic Data Sharing Certification – HSD Staff [HSD staff access only]
SOP Request for Genomic Data Sharing – Investigators
SUPPLEMENT Genomic Data Sharing
WORKSHEET Genomic Data Sharing Certification
WORKSHEETs Pre-Review [HSD staff access only]

Regulatory References

Version Information

Open the accordion below for version changes to this guidance.

Version History

Version Number Posted Date Implementation Date Change Notes
1.7 03.28.2024 03.28.2024 Revise to note that when there is no consent, NIH will review requests to use genomic data collected after 1/25/15; retire GDS consent worksheet and roll relevant information into WORKSHEET GDS Certification; update NIH reference hyperlinks
1.6 01.27.2022 01.27.2022 Minor wordsmithing, moderate reorganziation of content, and transfer content from app-based Word document to HTML webpage
1.5 06.24.2021 06.24.2021 Remove gendered terms; update formatting
1.4 01.03.2020 01.03.2020 Removed link to retired document
1.3 12.13.2019 12.13.2019 Updated links
Previous versions 10.08.2021 10.08.2021 For older versions: HSD staff see the SharePoint Document Library; Others – contact hsdinfo@uw.edu

Keywords: Ancillary review; GDS; Results