Seeking to unravel DNA
UW graduate student seeks to unravel the mysteries of the human genome using the cloud
When Timothy Durham looks at the human genome, he sees an encyclopedia of precise instructions that tell approximately 31 trillion cells in the human body how to do their jobs.
Figuring out how cells read and interpret these instructions—and how they can misread them—could help researchers unravel the mysteries of what leads to disease and point to cures. This is a complicated ongoing work being performed by thousands of researchers across the globe.
Over the past decade, their efforts have produced large amounts of rich data. So when Durham, a graduate student and researcher in the William Stafford Noble Lab in the UW Department of Genome Sciences, decided to join the research, he found that a desktop computer and small department servers would not be up to the task.
That’s why he turned to University of Washington Information Technology’s Research Computing experts, who recommended a cloud computing solution to do his work. The cloud, Durham said, provided him with virtually unlimited resources for computation, storage, networking and data management, the sort of tools he needed to build a complex three-dimensional model that would capture the state of the genome in different cell types. The model, he hopes, will help other researchers advance the field of genomics.
Interpreting the human genome has been a tremendous challenge. It is like looking at a cookbook written in a foreign language with its own unique rules of grammar. In this cookbook, Durham said, genes are like “recipes” that cells use to construct the machinery they need to function
“Now, we are starting to learn the language and the grammar of the genome, which is like learning to read the recipes and to understand which ones work well together and how the cell decides what to make,” he said.
The ultimate goal is to be able to understand how the genome is used in different types of cells in the body to answer questions such as, “Which genes are important to the function of skin cells versus liver cells?”
“If we can understand how cells pick the genes they need out of all 20,000 genes in the genome cookbook, it will have a profound impact on the way we understand human biology and disease.”
And in the same way that a cook doesn’t make every recipe in a cookbook when planning a meal, specific kinds of cells only care about certain subsets of genes when they are doing their work.
“If we can understand how cells pick the genes they need out of all 20,000 genes in the genome cookbook, it will have a profound impact on the way we understand human biology and disease,” Durham said.
Noble’s Lab is a perfect place for Durham’s work. The lab develops and applies computational techniques for modeling and understanding biological processes at the molecular level. Machine learning, a subfield of computer science focused on the study and construction of algorithms that can learn from and make predictions on data, is an important area for research, and Durham relied on its principles to develop his model.
“I am developing a model that captures the state of the genome across 127 different cell types. The full data set is more than 2 TB, which is more than the memory capacity of our entire lab cluster,” Durham said.
UW-IT set up Durham with Microsoft Azure and Amazon Web Services, which offer cloud services to the University of Washington. To help fund this, Durham applied for awards from Amazon’s Cloud Credits for Research program and from Microsoft’s Azure for Research program, and was granted $30,000 in cloud research credits, an extremely valuable contribution that helped accelerate his work.
“Research funding is not easy to come by, so the credit program is really valuable,” Durham said. “It helps you through the initial learning curve involved in moving to the cloud by removing some of the risk of adopting a new technology and allowing you an extended trial period in which you can really dive deep to see how well it works for your application,” Durham said.
Before Durham moved to the cloud, he was using lab servers, and even one of his smallest processing runs would take up to two full weeks to complete, said Rob Fatland, a UW-IT Research Computing Director who offers consulting and support to researchers looking at cloud computing solutions or other innovative tools offered at the UW, such as Hyak, the University’s shared cluster supercomputer.
“When he was using the department servers for his work, no one else could use them,” Fatland said. “In the cloud, he reduced processing time to hours without the restrictions that come with shared resources.”
Large-scale cloud computing for research is relatively new to the University, but it is quickly establishing itself as a valuable tool, Fatland said. When talking to researchers, he discusses security, management and cost to operate in the cloud.
Fatland said many researchers who have switched to the cloud have found that it is more cost effective for many types of computing, with costs decreasing over time. It is also extremely secure, so they don’t have to worry about losing their work. And it offers an elastic environment, easily allowing researchers to scale up their work instantly.
“It’s an empowering technology,” Fatland said.
That has been the case for Durham, whose goal for his three-dimensional model is to predict what parts of the genome are most important in a particular cell type, such as a liver or a heart cell.
“It is challenging to train one of these computing models,” he said. “You have to do a lot of fine tuning and it takes a lot of computing time to optimize it, a lot of trial and error.” But with the cloud, he doesn’t have to wait for anyone to get done with their work. It is always available when he needs it.
“In the end, if we can predict the most relevant portions of the genome in a particular cell type, this can help us zero in on specific regions of the genome that might, for example, harbor mutations that can contribute to disease,” he said.