Genomic Data Sharing
- Purpose and Applicability
- Overall Considerations for Institutional Certification
- Consistency with Applicable Laws & Policies
- Data Collection is Consistent with 45 CFR 46
- Consideration of Risks
- De-identification Requirements
- Informed Consent
- Data Use Limitations
- Related Materials
- Regulatory References
- Version Table
Purpose and Applicability
This webpage provides guidance to IRB members, HSD staff, and researchers about the review of: (1) research involving plans for sharing genomic data with NIH-designated repositories; and (2) requests for certification of the data.
Overall Considerations for Institutional Certification
The institutional certification should state whether the data will be submitted to an unrestricted or controlled-access database.
The institutional certification should assure:
- Consistency with applicable law, regulation, policy. The data submission is consistent, as appropriate, with applicable national, tribal, and state laws and regulations as well as relevant institutional policies. (details below)
- Data limitations are delineated. Any limitations on research use of the data, as expressed in the informed consent documents are delineated. (details below)
- Data is confidential. The identities of research participants will not be disclosed to NIH-designated repositories. (details below)
- IRB review. An IRB, privacy board, and/or equivalent body, as applicable, has reviewed the investigator’s proposal for data submission and assures that:
- The protocol for the collection of genomic and phenotypic data is consistent with 45 CFR 46. (details below)
- Data submission and subsequent data sharing for research purposes are consistent with the informed consent of the study participants from whom the data were obtained. (details below)
- Consideration was given to risks to individual participants and their families associated with data submitted to NIH-designated repositories and subsequent sharing. (details below)
- To the extent possible, consideration was given to risks to groups or populations associated with submitting data to NIH-designated repositories and subsequent sharing. (details below)
- The investigator’s plan for de-identifying data sets is consistent with the standards outlined in the NIH GDS policy. (details below)
Consistency with Applicable Laws and Policies
HSD staff identify applicable laws and policies as they would for review of any application (GUIDANCE Human Subjects Regulations; WORKSHEET Pre-Review).
Applicable policies include HSD SOPs and may include UW Privacy Policies.
Applicable laws frequently include HHS human subjects protections regulations (45 CFR 46), FDA human subjects protection regulations (21 CFR Parts 50 and 56), and the Health Insurance Portability and Accountability Act Privacy Rule (45 CFR Part 160 and Part 164, Subparts A and E).
Data submission must also be consistent with applicable tribal laws when the data are from American Indian and Alaska Native peoples. For example, tribal nations have jurisdiction over research conducted on tribal lands with tribal citizens. In general, the IRB relies on the researcher to provide relevant information about tribal laws as requested in the IRB Protocol form.
Data Collection is Consistent with 45 CFR 46
Data collection procedures must be consistent with HHS human subjects protections regulations.
- Prospectively collected data. This is verified either by the UW IRB review process or by relying on an external IRB for review.
- Retrospectively collected data. This is verified by confirming that data were collected with IRB approval.
Consideration of Risks
The IRB considers the risks associated with the genomic information in the event of re-identification and disclosure. The IRB considers ways to minimize those risks within the context of the expected benefits of broad sharing.
The IRB also considers the extent to which genomic information associated with the participant could be used to identify an individual, or their family, by matching data sets to other sources of information.
UW considers the sharing of genomic data through NIH-designated repositories to involve minimal risk provided the criteria listed below are met. It is important to note that sharing of genomic information through NIH repositories that does not meet these criteria is not inherently more than minimal risk.
- The expectations of the NIH GDS Policy or GWAS Policy are met;
- There is not a high risk of re-identification; and
- Results form secondary research using NIH data will not be returned to subjects
Risks of Re-identification
Currently, NIH-designated repositories that share genomic data do not meet the definition of human subjects research under HHS regulations at 45 CFR 46 because the data submitted to the repositories are collected solely for other research studies, and because the data are coded and the identity of the individuals from whom the data were obtained will not be readily ascertainable to the investigators maintaining the repository.
NIH notes that this review and certification process goes beyond the requirements of 45 CFR 46. However, NIH has implemented these policy requirements due to concerns that the evolution of genomic technology and analytical methods could increase the risk of re-identification and consequently risks associated with inadvertent or inappropriate use or disclosure.
Technologies available within the public domain today, and expected technological advances, make the identification of specific individuals from their genomic information increasingly straightforward.
The number of DNA markers, such as single nucleotide polymorphism (SNPs), that are needed to uniquely identify an individual is small. Data can be used with high certitude to confirm that two samples come from the same person. Nevertheless, the ease of identifying people from genomic data should not be overstated. This cannot be done without reference data and a high degree of expertise
Examples of populations that may be at a higher risk of re-identification include:
- Geographically defined communities;
- Members of ultra-rare disease groups;
- Individuals who have engaged in illegal behavior (see below)
Risks Associated with the Freedom of Information Act (FOIA)
NIH-designated repositories are U.S. government records that are subject to the Freedom of Information Act. NIH is required to release government records unless the records are exempt from release under one of the FOIA exemptions.
NIH believes the release of certain information to be an unreasonable invasion of privacy under FOIA exemption 6, 5 U.S.C. §552 (b)(6). Therefore, NIH foresees preserving the privacy of research participants and the confidentiality of genetic information by, for example, redacting individual-level genotype and phenotype data from any disclosures made in response to FOIA requests and the denial of unredacted requests.
Risks associated with Law Enforcement
Although NIH-repositories hold only coded data, it is conceivable that law enforcement agencies could ask for genomic information from the repositories, and, for example, search for matches to DNA for forensic purposes. Law enforcement might seek to compel disclosure of identifying information from the institution holding the identifying information.
Release of identifiable information may be protected from compelled disclosure if a Certificate of Confidentiality is or was obtained for the original study. See GUIDANCE Certificate of Confidentiality.
Potential Harms to Individuals, Family Members, Specific Populations, Groups, and Communities
Harms that result from inappropriate use or disclosure of genomic data may include denial of employment or insurance.
The Genetic Information and Non-discrimination Action of 2008 (GINA) provides a baseline level of protection against genetic discrimination in the United States.
- GINA is a federal law that prohibits discrimination in health coverage and employment based on genetic information.
- GINA does not protect against discrimination in the context of life insurance, disability insurance, or long-term care insurance. GINA’s protections apply to “asymptomatic” individuals, not those who have manifested disease.
Harms may also include psychosocial harms such as stress, anxiety, stigmatization, or embarrassment resulting from disclosure of information about family relationships, ethnic heritage, or potentially stigmatizing conditions.
Research has shown that some populations demonstrate a higher predisposition to developing certain diseases or disorders than others. Genetic variants associated with physical disorders, diseases, and behavioral traits and causative variants will be found in all populations with differing frequencies. Higher or lower frequencies that contribute to observed health patterns, particularly those that can be viewed negatively, can lead to genetic stereotypes and stigmatization of a population group.
Return of Individual Research Results
Return of individual research results to participants from research using data shared through NIH-repositories is expected to be an extremely rare occurrence. Nonetheless, the return of results must be carefully considered because the information can have a psychological impact (i.e. stress and anxiety) as well as implications for the participant’s health and well-being. While clinically valid and meaningful results can have a positive impact on an individual’s health, harms can occur if un-validated research results are provided back to participants or used for medical decision-making.
Secondary investigators will not be able to return results directly to participants because they will not have access to the identities of these individuals. If a secondary investigator does generate clinically valid results of immediate clinical significance, they can only facilitate their return by contacting the contributing investigator who holds the key (if still maintained) to the code that identifies participants.
When links to identifying information are retained, individual participants should be given the option of choosing or declining to receive results. If participants are given the option of receiving results, researchers should be aware that results may be returned years after they have submitted the study data to NIH.
De-identification of Data is Consistent with GDS Policy
- Data submitted to NIH-designated repositories must be de-identified and coded using a random, unique code.
- The 18 identifiers enumerated in the HIPAA Privacy Rule and in the SUPPLEMENT Genomic Data Sharing must be removed.
- Data should be de-identified such that the identities of the individuals from whom the data were collected cannot be readily ascertained or otherwise associated with the data by the NIH repository staff or secondary data users.
Consent Requirements and Expectations for Genomic Data Sharing
Use the WORKSHEET Consent Requirements and Expectations for Genomic Data Sharing to identify the applicable consent requirements for genomic data sharing and to determine whether the requirements are met.
- If the consent requirements cannot be met, the same worksheet can be used to determine if the conditions are met for an exception to the consent requirements. Note that exceptions must also be approved by a member of the HSD Management Team.
- If neither the consent requirements nor conditions for an exception can be met, the data cannot be certified unless subjects are re-consented for genomic data sharing.
Studies Involving Minors
If the study involves children, the IRB must consider the appropriateness of the continued maintenance and sharing of the data when the child reaches the legal age of consent.
In particular, it is important to consider whether consent should be obtained from the now-adult subject. When a link to identifiers is maintained, researchers must provide the subject with the opportunity to withdraw data from the NIH-repositories, unless the IRB approves a waiver of the consent requirement for the now-adult subjects. See GUIDANCE Consent Protected and Vulnerable Populations for information about consent waivers.
Studies Involving Consent by Legally Authorized Representative (LAR)
If the study proposes to obtain consent from legally authorized representatives, the IRB must consider the issues related to LAR consent as described in GUIDANCE Consent Diminished or Fluctuating Consent Capacity and Legally Authorized Representative (LAR).
In particular, it is important to consider reconsent of subjects who regain the capacity to consent for themselves. When a link to identifiers is maintained, researchers must obtain consent from the subjects who regain the capacity to consent and provide the subject with the opportunity to withdraw data from the NIH-repositories unless the IRB approves a waiver of the consent requirement.
Data Use Limitations
Consistency With Informed Consent
Through the Controlled Access process for providing data access to secondary users, mechanisms are in place to minimize the likelihood of usage of genomic data in ways that are inconsistent with the original informed consent. The IRB is expected to: (1) have reviewed all proposed submissions of data to NIH-designated repositories to ensure that the submission and subsequent sharing for research purposes are consistent with the informed consent of the study participants; (2) certify the appropriate research uses of the data; and (3) identify the specific data use limitations.
The IRB accomplishes this by reviewing the terms of the consent form and documenting any limitations to use of the data, as expressed in the consent form, in the Data Use Limitations table in the GDS section of the WORKSHEET Genomic Data Sharing Certification (which is ultimately included in the Institutional Certification).
For example, if the consent form includes the possibility of data sharing but states that the data will only be used for the study or a particular disease, a disease specific data use limitation should be documented in the Institutional Certification unless subjects are re-consented for broader use of the data.
Four Main Categories of Limitations. (see NIH reference)
- General research use. Data can be used for any research purpose but would not be made available for non-research purposes. These data would generally be made available to any qualified investigator.
- Health/medical/biomedical. Use of these data is limited to a focus on health/biomedical research objectives, excluding the study of population origins or ancestry. These data would generally be made available to any qualified investigator.
- Disease-specific. Data can only be used for research on a specific disease or a related condition. When informed consent documents allow the data to be used for future studies related only to a particular disease (e.g., diabetes and related conditions), a disease-specific limitation would be appropriate.
- Other. These are data use limitations that are not included in the standard NIH categories that are specified by the certifying institution.
Modifiers to the Main Categories. The following limitations are modifiers of the four main categories:
- Genetic studies only. Data can be used only for genetic studies. These may include research on the role of genetics in any disease, condition, or non-disease trait. These may also include research that could have implications for understanding ancestral history because of the information that it may provide about allele frequencies in different populations.
- Methods. Data can be used for statistical methods research and development (e.g., development of statistical software or algorithms).
- Not-for-profit use only. Data can be used only for not-for-profit organizations. If the data should not be made available to commercial entities, this restriction should be stated specifically as a data use limitation.
- Publication required. Data can be used only if the secondary investigator will disseminate the study findings to the larger scientific community.
- IRB approval required. Data can be used only with IRB approval from the secondary investigator’s institution. Documentation of local IRB approval, including a description of the type of review (e.g., full committee, expedited), would be submitted as part of the data access request.
This section provides definitions for key Genomic Data Sharing concepts, as described in NIH Policies.
Coded: Any identifying information (such as name) that would enable the investigator to readily ascertain the identity of the individual to whom the private information or specimens pertain has been replaced with a number, letter, symbol, or combination thereof (i.e., the code) and a key to decipher the code exists, enabling linkage of the identifying information to the private information or specimens.
Controlled-access: Data are available to an investigator for a specific project only if certain stipulations are met.
dbGaP (database of Genotypes and Phenotypes): A central data repository at the National Center for Biotechnology Information (NCBI), a branch of the National Library of Medicine.
De-identified data: Note that this definition is specific to NIH’s Genomic Data Sharing policy. Data that has been de-identified according to the following criteria: the identifiers of data subjects cannot be readily ascertained or otherwise associated with the data by the repository staff or secondary data users (45 CFR46.102(f)); the 18 identifiers enumerated at 45 CFR 164.514(b)(2) (the HIPAA Privacy Rule) are removed; and the submitting institution has no actual knowledge that the remaining information could be used alone or in combination with other information to identify the subject of the data.
Large-scale genomic data: The GDS Policy applies to all NIH-funded research that generates large-scale human or non-human genomic data as well as use of these data for subsequent research. Large-scale data include genome-wide association studies (GWAS), single nucleotide polymorphisms (SNP) arrays, and genome sequence, transcriptomic, metagenomics, epigenomic, and gene expression data. Examples are included below. See Supplemental Information to the NIH Genomic Data Sharing Policy for more examples.
- Sequence data from more than one gene or region of comparable size in the genomes of more than 1,000 human research participants
- Sequence data from more than 100 genes or region of comparable size in the genomes of more than 100 human research participants
- Sequence data from more than 100 isolates from infectious organisms
NIH GWAS Data Repository: Also known as the “Database of Genotype and Phenotype (dbGaP)”, the NIH GWAS Data Repository is a database developed by the National Center for Biotechnology Information (a division of the National Library of Medicine) to archive and distribute the results of studies that have been investigated.
NIH-designated repository: Any data repository maintained or supported by NIH either directly or through collaboration.
Unrestricted-access: Data are accessible to anyone via public website (previously referred to as “open access”).
UW IO: A Senior Official at the institution who is credentialed through NIH eRA Commons system and is authorized to enter the institution into a legally binding contract and sign on behalf of an investigator who has submitted data or a data access request to NIH. The UW Institutional Official who has the authority to provide institutional certification for data sharing under the GWAS and GDS Policies is the Grant and Contract Administrator processing the award.
GUIDANCE Certificate of Confidentiality
GUIDANCE Human Subjects Regulations
SOP Genomic Data Sharing Certification – HSD Staff [HSD staff access only]
SOP Request for Genomic Data Sharing – Investigators
SUPPLEMENT Genomic Data Sharing
WORKSHEET Consent Requirements and Expectations for Genomic Data Sharing
WORKSHEET Genomic Data Sharing Certification [HSD staff access only]
WORKSHEETs Pre-Review [HSD staff access only]
- NIH Genome-Wide Association Studies (GWAS) Policy
- NIH GWAS FAQs
- NIH Genomic Data Sharing (GDS) Policy
- NIH GDS FAQs
- NIH Points to Consider for Institutions and Institutional Review Boards in Submission and Secondary Use of Human Genomic Data under the National Institutions of Health Genomic Data Sharing Policy
- Supplemental Information to the NIH Genomic Data Sharing Policy
- NIH Standard Data Use Limitations
- NIH Points to Consider in Developing Effective Data Use Limitations Statements
- NHGRI Informed Consent for Genomic Research
- Lowrance, WW and Collins, F. “Identifiability in Genomic Research”, Science vol 312, no 5838, 2007.
- Wendler DS and Rid, A. “Genetic research on biospecimens poses minimal risk”, Trends in Genetics vol 31, no 1, 2015.
Open the accordion below for version changes to this guidance.
|Version Number||Posted Date||Implementation Date||Change Notes|
|1.6||01/27/2022||01/27/2022||Minor wordsmithing, moderate reorganziation of content, and transfer content from app-based Word document to HTML webpage|
|1.5||06/24/2021||06/24/2021||Remove gendered terms; update formatting|
|1.4||01/03/2020||01/03/2020||Removed link to retired document|
|Previous versions||10/08/2021||10/08/2021||For older versions: HSD staff see the SharePoint Document Library; Others – contact firstname.lastname@example.org|
Key words: Ancillary review; GDS; Results