Scientists unravel the ocean’s mysteries with cloud computing, data science skills and a sea of data

 

This octopus lives nearly a mile deep on an active volcano in the ocean. High-definition cameras on the Ocean Observatories Initiative Cabled Array can provide high-resolution images of life deep in the ocean. Video credit: Nancy Penrose/UW; V05

By Elizabeth Sharpe
The internet of the ocean. That’s how UW oceanography professor Deborah Kelley describes the cabled suite of instruments tracking the inner workings of the ocean and streaming real-time, nonstop data to shore at the speed of light.

Running along the seafloor down to 10,000 feet deep, the fiber-optic cables span the Juan de Fuca Plate, the farthest site 300 miles from the coast of Oregon, carrying countless rows and columns of numerical values, packets of video and sound recordings on a bandwidth of up to 10 gigabits per second.

Called the Cabled Array, it is part of the National Science Foundation’s Ocean Observatories Initiative (OOI), a system of integrated, scientific platforms and interactive sensors providing scientists the capability — through unprecedented access to physical, chemical, geological and biological data — to address critical issues that affect the relationship between the ocean and the Earth.

Deborah Kelley

Deborah Kelley
Director of the Cabled Array and UW Oceanography Professor

“We’ve never had the technology in the ocean before to get these precise measurements and this level of spatial and temporal resolution in the data,” said Kelley, who is also director of the OOI Cabled Array.

All of the data are freely and publicly available. But the sheer volume and immense complexity of the data are major challenges, she adds. “We can’t use Excel. All the files are too large,” she said. “How do we visualize the data streaming in from more than 140 instruments in meaningful ways, to explore and understand the kinds of questions we’re looking at?”

For Kelley and other oceanographers, the stakes could not be higher.

The World Ocean is our planet’s life support system. It covers three-quarters of our world, supplies 80 percent of the oxygen, stores 50 times more carbon dioxide than the atmosphere, regulates our climate, supports a diversity of species, and produces energy, food, medicine and other resources crucial to sustaining life on Earth.

Unlocking its mysteries could help us better predict earthquakes, volcanic eruptions and tsunamis; discover new sources for energy; protect marine biodiversity and ecosystems; and understand the impacts of climate change and how to mitigate or adapt to the changes already underway.

I grew up saying the data’s mine because that’s how we got promoted. That’s how we made our reputation. My paper, my data. This is not how you’re going to make the big breakthroughs anymore. – Deborah Kelley

That’s why oceanographers teamed up with data and research computing experts to organize a unique event at the University of Washington in late August 2018 to help ocean scientists learn the computational tools, techniques, data management and analytical skills needed to handle this massive amount of data.


Hacking the ocean

“Without data science methodologies and computational tools, scientists are at a disadvantage when it comes to making sense of so much data,” said Rob Fatland, director of research computing in UW Information Technology (UW-IT) and a co-organizer of the August event.

UW-IT experts like Fatland in cloud computing, along with the UW’s eScience Institute’s nexus of experts in data science tools and methodologies, help provide scientists the support they need to advance their work.

Together, they joined about 50 ocean scientists who convened at the UW for Oceanhackweek, five intensive days of hands-on tutorials and collaborative investigations. The event was underwritten by more than $100,000 in grant funding from the Consortium for Ocean Leadership, the nonprofit organization that oversaw the OOI until October 2018 through a coalition of research institutions. The UW is among the organizations contracted to operate this massive endeavor for another five years, with a recent award from the National Science Foundation.

Oceanhackweek followed a February 2018 hackweek organized by a small group of volunteers that included Fatland and Kelley to explore data from the OOI Cabled Array, the underseas network of fiber-optic cable off the Pacific coast that if laid end-to-end, would stretch across Washington state, all the way to Boise, Idaho.

[SLIDESHOW]

Ocean Observatories Initiative

The OOI is made up of integrated, scientific platforms and interactive sensors. NSF/OOI/UW CEV

OOI’s Cabled Array

The cabled network of sensors run along the sea floor from Oregon. NSF/OOI/UW CEV

Underwater volcano

An HD camera records a volcanic summit in the ocean. UW/NSF-OOI/CSSF

The OOI Cabled Array is delivering data on a scale that was previously not possible. More than 140 instruments are working simultaneously: seismometers, hydrophones, echosounders, fluorometers, HD cameras, fluid samplers, mass spectrometers, and others. Sensors are measuring earthquakes, carbon dioxide, light, temperature and a whole host of other variables. High-resolution cameras are capturing deep-sea creatures, while hydrophones are recording digital songs from whales and dolphins.

At the summit of Axial Seamount, the largest and most active underwater volcano off our coast, 21 cabled instruments are measuring its seismic heartbeat, the inflation and deflation of its roof from oozing magma, sampling the fluids and the microbial DNA, and snapping high-resolution images of life forms that thrive on volcanic gases. Even if you’re investigating something as simple as temperature around an animal-covered hot spring at the summit of the volcano, explained Kelley, the instrument there is measuring 24 fluid temperatures continuously in three dimensions.

“Even for one day, how do you pull together and investigate these huge datasets to discover their secrets, leading to a better understanding of these kinds of dynamic environments? It’s a fire hose,” Kelley said.

Friedrich Knuth, who spent three years at Rutgers on OOI’s data team, gave researchers the tools to tap into the data provided by OOI.

“My role as a data evaluator was to open the door,” he said, and put a nozzle on the data.

To make Oceanhackweek happen at UW, Knuth teamed with Wu-Jung Lee, a research associate at the Applied Physics Lab and Valentina Staneva, a senior data scientist at the eScience Institute, who helped conceive the hackweek idea through their work on OOI data. They quickly garnered interest and support from others in organizing first the Cabled Array Hackweek and later Oceanhackweek. Organizers also included Amanda Tan (UW-IT), Don Setiawan (UW School of Oceanography), Anthony Arendt and Aaron Marburg (Applied Physics Lab), and Rachael Murray (eScience Institute).


Tutorials, research computing in the cloud

Oceanhackweek participants deep in conversation about data validation and simplifying access to oceanography data. Photo credit: Rachael Murray, UW eScience Institute

Participants hailed from academic institutions around the world and ranged from early to established career scientists and engineers.

Amanda Tan set up a shared cloud computing environment where participants could access, work in and download all the tutorials. Tan, like Fatland, shares an appointment in UW-IT and the eScience Institute, and is a research computing cloud technology lead developer.

UW Information Technology supports world-class research by providing up-to-date tools and resources like cloud computing to help accelerate discoveries.

During a session she taught, Tan asked researchers if any of them had used cloud computing before, and only a few hands went up.

Tan listed off the advantages of cloud computing — immediately available, with no waiting in line for resources, and no need to buy, manage or maintain computer equipment and servers. The technology is built on familiar operating systems and software applications, and it is secure. Plus, cloud computing is elastic and scalable.

When a researcher asked about cost, Tan said you only pay for what you use, from mere pennies up to $16 an hour.

Tan’s tutorial, along with the others, were recorded and provided online to encourage collaboration and share knowledge with those who could not attend.


Open access, open data, open science

Anthony Arendt, who has joint appointments with the UW’s Applied Physics Lab and the UW’s eScience Institute, recently co-authored a paper published in the Proceedings of the National Academy of Sciences on the experience of developing and coordinating hackweeks. In it, he explored how they can be a model for data science education and fostering research collaboration. He has also developed what amounts to a “cookbook” on the logistics of running a hackweek, available to anyone.

To Arendt, hackweeks are about facilitating and democratizing access to the data through skills training and open source tools. About 80 percent of the time is spent on getting at the data, and 20 percent is spent on doing the science.

Participants in hackweeks become ambassadors, sharing what they’ve learned, and often continuing the collaborations they started.

Hackweek organizers and participants are champions of open, reproducible science, even while recognizing that sharing new data and discoveries can be at odds with the competitiveness of research publications and grant funding.

Yet, that long-established view is changing, as evidenced by the National Science Foundation’s policy on open data, open access institutional repositories used by many North American universities, and efforts toward shared data standards and persistent data and code URLs.

“I grew up saying the data’s mine because that’s how we got promoted. That’s how we made our reputation. My paper, my data,” Kelley said. “This is not how you’re going to make the big breakthroughs anymore.”

Instead, Kelley said major discoveries will come with “having all these new eyes on data and people with different expertise working together collaboratively and coming up with tools, technologies and insights that one individual could never do, and then sharing them with the rest of the world.”

The partnership with the eScience Institute and UW-IT has been invaluable, Kelley explains. The data science and research computing expertise are helping ocean scientists access and learn the tools they need to wrangle the data and accelerate their research.

“It’s a testament to the UW vision,” Kelley said, speaking of the close collaboration between the School of Oceanography, the Applied Physics Lab, and others that led to the development of the Cabled Array and to the first-ever hackweeks for oceanography.

“UW has some amazing resources,” she said, “and that’s why I’m so glad the hackweek was here. You bring people with new eyes, from very different backgrounds, which results in different ways of thinking about the data. That’s very exciting.”

Learn more:
Find out about the research computing tools and solutions available as well as individual consultation tailored to your research needs.