How to request datasets from dbGaP and other federal repositories
Federal Data Repositories
There are security and operational standards that must be in place in order to use controlled-access data from a federal repository. You may need to seek an IT environment that meets these standards prior to accessing. You can request assistance via UWIT intake form (UW NetID required).
For NIH Controlled-Access Data Repositories, review the:
- Required Security and Operational Standard for NIH Controlled-Access Data: NOT-OD-25-159.
- List of NIH Controlled-access Data Repositories.
dbGaP Overview
The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the results of studies that have investigated the interaction of genotype and phenotype. Such studies include genome-wide association studies, medical sequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits. The advent of high-throughput, cost-effective methods for genotyping and sequencing has provided powerful tools that allow for the generation of the massive amount of genotypic data required to make these analyses possible.
dbGaP provides two levels of access – open and controlled – to allow broad release of non-sensitive data, while providing oversight and investigator accountability for sensitive data sets involving personal health information.
Before you begin a Data Access Request
Review the following questions and guidance.
- Authorized users must be employees of the University of Washington. If your team includes users at other organizations, they need to request access through their own institution.
- For dbGaP, you will need appropriate system credentials in eRA Commons. If you need access, review Commons Roles at the UW.
- Review NIH How to Request and Access Datasets from dbGaP.
- Review UWIT guidance on Computing for restricted access data.
- Identify whether you will use an existing UW environment, set up a new secure environment through UW, or use a third-party environment with UWIT approval. In some cases, it may be possible to use an NIH-hosted environment (e.g., AnVIL or BioData Catalyst).
- Consult with the authorized IT Director for the environment you plan to use. If you will be using an NIH-hosted environment, please reach out to your department’s IT administrator.
If you are working with controlled-access data:
- Complete the required training on Research Security: Protecting NIH Controlled Access Genomic Data.
- Save completion certificate to include with SAGE Request.
- An Authorized IT Director for the environment being used must provide confirmation that the IT environment meets NIH Security Best Practices for Users of Controlled-Access Data.
- You will need an Assurance signed by Approved User that the NIH Security Best Practices can be met.
Start your Data Access Request (DAR)
After reviewing the previous guidance, follow these steps to begin your DAR.
- Choose datasets you wish to access.
- Some datasets require IRB approval. See the Human Subjects Division guidance on obtaining IRB approval.
- Select the Signing Official: Select the authorized official.
- Your OSP reviewer will update the Signing Official to themselves after they receive the accompanying SAGE request. See steps to Prepare your Request in SAGE to OSP.
- In the DAR, list the authorized IT Director who has firsthand knowledge of the IT environment you intend to use. This is the same person who signs the IT Director Confirmation.
- If using a Cloud Computing IT Environment (UW Government Community Cloud or UW GCC), upload the UW Cloud Computing IT Environment Statement into the DAR.
- Read the attestation language.
- Add other necessary attachments required by NIH, such as IRB Approval.
- Read and agree to the terms and conditions as the “Approved User”:
- Investigators and their institutions are responsible for safeguarding the accessed datasets. Pay close attention to the Data Use Certification (DUC) being made by you as an Approved User.
- Review and approve the Data Access request so it begins routing to the Signing Official.
- Download a copy of the DAR, then proceed with next steps to prepare your SAGE request to OSP.
Prepare your SAGE Request to OSP
The type of SAGE request depends on whether your DAR is associated with an existing sponsored program. If it is associated with a sponsored program, route an OSP & GCA Modification Request (MOD) in SAGE.
If it is not associated with a sponsored program, route a Non-award Agreement (NAA) eGC1.
- Prepare and route the request in SAGE
- For a MOD, select the “Federal Repository Data Access and Submission” subcategory.
- For an eGC1, select the “Non-Award Agreement (new)” application type if you are requesting access to new data, or “Non-Award Agreement (continuation)” if it is a renewal request for data you have already been using.
- Attach the following to your SAGE request: .
- Copy of the DAR.
- A copy of a signed IT Director confirmation. This is the same person who is named as IT Director in your DAR.
- Assurance signed by Approved User that the NIH Security Best Practices can be met.
- If the dataset you wish to access requires IRB approval, a copy of the IRB approval.
- If applicable, copy of the completion certificate for the required training on Research Security: Protecting NIH Controlled Access Genomic Data.
- OSP will review the Award Modification Request (MOD) or NAA eGC1 together with the DAR in eRA Commons.
- Check status on “My Requests” page in eRA Commons.
Signing Official (OSP) Review
- DAR is complete.
- An authorized IT Director is identified.
- A signed confirmation statement from IT Director is attached in SAGE to the NAA eGC1 or Award Modification,
- Assurance statement signed by the Approved User is attached to SAGE item.
- If the IT Environment used is “GCC High”, that PI has uploaded the UW Cloud Computing Statement in the DAR.
- IRB approval, if needed, is attached to the DAR, and corresponds to the study in question.