UW Information Technology

UW researcher uses high-performance computing to understand how online communities work — and just maybe how to keep them from sliding into oligarchy

Man touching a network concept on a touch screen with his finger

What makes a successful online community? And what makes them fail? These are key questions that underlie social research conducted by the Community Data Science Collective.

By Ignacio Lobos

In the world of online communities, it is fair to say Benjamin Mako Hill has been wildly successful.

He was part of a small group that founded Ubuntu, a Linux operating system committed to open source software development, and today one of the biggest names in the Linux world — with a global online community of hundreds of thousands of people. But there were many other online communities he created, alone and with others, which didn’t do nearly as well.

“Did I just get lucky with some of my attempts at community building?” the assistant professor in the Department of Communication often asks himself. “There was a lot of online community building but no science of those communities,” he said.

Benjamin Mako Hill
Assistant Professor, Department of Communication

Today, Hill seeks to answer how and why some attempts at collaborative production — like Wikipedia and Linux — build large and eager volunteer communities that produce high-quality work, and why so many others fail to attract any members at all.

Getting at these questions requires analyzing incredibly large amounts of data produced by Wiki and Linux groups, so Hill relies on Hyak, the UW’s supercomputer.

“With English Wikipedia, we have many terabytes of information, with nearly 1 billion revisions to almost 6 million pages. That’s just way too much data for most computers, but not for Hyak,” he said. “We can crunch the full dataset in a matter of hours or a few days.”

Of course, this is only one example of the hundreds of thousands of wikis he studies, with the help of Hyak.

Hill does most of his groundbreaking research as a member of the Community Data Science Collective, an interdisciplinary research group from the UW Department of Communication and Northwestern University Department of Communications Studies. Hill cofounded the group, which is housed at the UW.

“These online groups produce tons of data,” Hill said. “I knew I could make the data speak and answer some of these questions.”

UW supercomputer helps researchers understand dynamics of online groups

The group’s use of Hyak underscores the increasing value of big data and quantitative analysis in the social sciences.

“At the UW, the use of supercomputers like Hyak is at the center of a revolution of information in the social sciences,” said Nam Pho, director of Research Computing in UW Information Technology (UW-IT). Pho, who oversees Hyak at the UW, said more social scientists are relying on big data to help them understand human society and their interactions. Understanding data science and the use of supercomputers are now indispensable skills for social scientists, he said.

At the UW, there are broad efforts to make computing tools more readily accessible to researchers in the social sciences, said Jim Pfaendtner, associate vice provost for Research Computing.

“We continue to work with researchers on the design of new research computing solutions to meet the needs of data scientists and researchers,” he said.

Testing the “iron law of oligarchy” on the net

In one of his earliest projects completed at the UW, Hill sought to test whether the renowned political theory known as the “iron law of oligarchy” also applied to online communities, where peer collaboration and production are the raison d’être. The theory, developed by Robert Michels, a German sociologist, states that all complex organizations, no matter how democratic they start, ultimately develop into oligarchies.

Hill and Aaron Shaw, a Northwestern University associate professor in the School of Communication, used exhaustive data from nearly 700 wiki groups to determine their behavior over time. Would they remain open and collaborative? Were they really laboratories of democracy?

To answer their questions, they constructed empirical measures of participation and used them to test for any increases associated with oligarchies. Their conclusions, published in a paper, were not encouraging. Many of the peer production projects showed “entrenched leadership and deep inequalities, suggesting that they may not fulfill democratic ideals.”

However, their work showed a glimmer of hope. There were some wiki groups that appeared “more robustly democratic,” signaling “digital technologies, like their offline counterparts, might — or might not — be used to create participatory democratic organizations.” Their paper, he said, opened new avenues of inquiry.

“Oligarchies don’t have to rule,” Hill said. “Sometimes, we see online communities that are largely resistant to undemocratic changes. What are they doing right? How are they socializing their members? How can we learn from them?”

Thanks to this groundbreaking research, the Data Science collective has garnered national accolades. In summer of 2019, the group received a $497,754 grant from the National Science Foundation to build an “ecological theory” on how online communities affect each other in “mutualistic and competitive ways.”

Benjamin Mako Hill talks about how he extracted information from the Scratch online community, and how he shared what he learned with other researchers, earning him a Research Symbiont Award for 2019.

And just before the grant was announced, Hill received an award for sharing scientific data beyond the expectations of his field — data collected from Scratch, a programming environment and online community for young people.

Hill’s contribution sought to make it easier for researchers to “study how young people learn, create, communicate and interact in informal learning environments — especially around computer programming.”

Pulling this data from the web is no easy task, Hill said. Researchers need to create custom software to download and scrape web pages for information.

“The internet has given us the opportunity to try new angles on old problems,” Hill said.

Communication, even online, is at the center of cultural identity and expression. What happens to online communities, how they communicate, how they evolve or fail, can help us better understand society at large, he said.

“But that means getting used to working with complex tools and approaches,” Hill said. “We got Hyak, and we are cranking through as much data as we can. We are just getting started.”

Learn more: Doing research at the UW? Hyak can help. The Data Collective offers public data science workshops. Get involved.