Population Health

October 15, 2025

An Initiative-funded team examines the ethics of a key genomic dataset

An illustration of genome data codeUniversity of Washington researchers are leading new research that focuses on the ethical and social implications of using legacy datasets in population health research. Developed through a collaboration between UW’s Department of Bioethics & Humanities and the Department of Biostatistics, the project focuses specifically on the Human Genome Diversity Project (HGDP), which is a dataset featuring immortalized cell lines from 52 distinct human populations.

HGDP garnered significant criticism at its establishment in the 1990s, particularly from Indigenous communities and advocacy groups, who raised critical concerns about its inadequate consent, possibility for commercialization and lack of any meaningful community engagement or benefit-sharing agreements. Despite the controversy surrounding this dataset, data derived from the cell lines have been widely integrated into population health research.

Dr. Stephanie M. Fullerton, lead investigator and UW professor of Bioethics and Humanities, was inspired to conduct the research from experiences working in the National Institutes of Health-supported Polygenic Risk Methods in Diverse Populations (PRIMED) consortium.

“PRIMED was facing a dilemma on whether to continue using the HGDP dataset as reference panel in consortium work, and there were both statistical and ethical questions as to the pros and cons,” explains Dr. Fullerton. “We were also motivated by the observation that many researchers who routinely use HGDP aren’t aware of the historical context and associated controversies.”

Co-principal investigator Dr. Sarah C. Nelson, a senior research scientist in the UW’s Department of Biostatistics, expressed surprise at how far HGDP has evolved since the 1990s. “Compared to contemporary data collections, it’s surprising that a sample of around 1,000 individuals assembled decades ago to primarily study population genetics has become such a gold standard for representing genetic diversity across much of biomedical research,” remarks Dr. Nelson.

HGDP’s evolution in modern research brings forth many important questions about consent and data sovereignty, particularly regarding Indigenous data rights and community control over genetic information. Current discussions around Indigenous data sovereignty stress that Indigenous communities must maintain control over data collection, ownership, and application of research that involves their genetic materials and cultural knowledge. “Where it is difficult to determine what biospecimen donors were told at the time of donation, the ongoing use of genetic information derived from such samples should be better controlled or perhaps curtailed altogether,” argues Dr. Fullerton.

To explore these questions, the researchers are conducting a comprehensive literature review and qualitative interviews with key informants about HGDP data use. The project, funded by the University of Washington’s Population Health Initiative, utilizes an interdisciplinary approach, which the authors believe to be a strength of the research. “The tradeoffs and implications of whether or not the genomic research community continues to use HGDP cut across science and ethics, so it’s important to have those various perspectives and research experiences reflected in the research team,” says Dr. Nelson.

The team recognizes that Indigenous scholars should lead in guiding this research direction. When asked about the ultimate impact and recommendations of their work, the researchers acknowledge, “It is too early to tell. First, we must publish our literature review and interview study findings and see what Indigenous scholars, and others with an interest in Indigenous data sovereignty, make of our results.”

The research team hopes that their research will help inform policy recommendations about the use of controversial datasets in population health research. “We hope that all investigators, whether they are collecting biospecimens de novo, or working with de-identified derived data, will take seriously the interests of the research participants who made those data available for scientific use,” says Dr. Fullerton. “Where those interests are unknown, the stakes may be too high to ensure ongoing respectful distribution and use.”