Data Science Masters

June 10, 2021

Meet the Organizers of the COVID-19 Hackathon

Recently, we caught up with current MSDS student Sepi Dibay and recent MSDS alumna Deepthi Hegde on their successful COVID-19 Hackathon from summer 2020.


Bios:

Sepideh (Sepi) Dibay immigrated to the United States in 2009 and pursued her Master of Public Health and Ph.D. in Epidemiology. She did her postdoctoral research at Fred Hutchinson Cancer Research Center and currently is interning at Amazon as a Research Scientist. Sepi has extensive experience designing/conducting research and analyzing observational and experimental data. She is also pursuing a Master of Science in Data Science at UW to enhance proficiency, and expand domain versatility.

 


Deepthi Hegde is a data scientist who is passionate about building real-time products that are scalable. She is currently with Microsoft and is a recent graduate of the Master of Science in Data Science program at UW. While at UW, she did research internships at Google and Nike, focusing on deep learning applications in computer vision. Before that, she was a researcher at Carnegie Mellon University where she worked on various machine learning projects. In her free time, she loves to mentor students on interviewing and jobs in data science.

 

What motivated you to organize the COVID-19 Hackathon?

Sepi has always been concerned with researching population health. As COVID-19 started and the shutdown happened, we were looking for a way to contribute to studying this new phenomenon. We tried different channels such as volunteering for the State of Washington Health Department but since everything was new and the spread of this deadly virus was happening quickly we could not find a meaningful way to contribute. Thus, we decided to organize an online hackathon to contribute to this cause and harness the power of data science by engaging people who are interested in helping  the community. 

It was 2 months into the pandemic and the situation wasn’t getting any better. We were bored of staying home and were looking for meaningful ways to contribute towards the cause in whatever way we could. As data scientists, we believed in the power of data in combating the situation. We thought that by coming together as a community and combining research efforts and sharing insights, we could create more impact than each of us could individually.

 

How many participants were there in total? 

100+ students signed up and 42 participants made a submission.


How many teams submitted projects?

13 teams.


The event was virtual because of the COVID-10 pandemic. How did the fact that it was completely online impact the event? 

The online nature of the event made way for participation from several different countries and time zones. 

This was a very new way of organizing a hackathon and needed a lot of coordination and arrangements to spread the word and engage the participants. Even though in some sense the online format limited our power to collaborate in person, it definitely helped us to have participants from around the world. Also, we were able to use the expertise of data scientists in different states to offer several workshops on the first day of the hackathon.


What platforms did you use to host the hackathon? Can you describe how participants and teams were able to participate virtually?

We used Slack extensively for communications with the participants. We asked teams to use GitHub to present and store their projects. We also used Zoom for launching the hackathon. The hackathon took two days and during those two days we (organizers and participants) got together to check in via Zoom. We also had several workshops.


Who were the judges? 

  • Tim Randolph, Associate Member at Fred Hutchinson Cancer Research Center 
  • Anna Talman Rapp, Program Officer at the Bill & Melinda Gates Foundation
  • Duncan Wadsworth, Data Scientist at Microsoft
  • Ying Li, Chief Scientist at Giving Tech Labs

 

What kinds of datasets did teams use?

We provided two datasets and participants were allowed to use any other public dataset if they wanted in the spirit of open ended research and creativity.

 

Describe the awards categories.

Track I: Best Storytelling/Data-Science Process (text-heavy)

  • Clear hypotheses and assumptions (20)
  • Exploratory data analysis (20)
  • Problem solving (20)
  • Comprehensive take-aways (20)
  • Reproducibility (20)

Track II: Best Prediction Model (numbers-heavy)

  • Problem setup and metric definition (20)
  • Quality of features (20)
  • Explanation of choice of model (20)
  • Model evaluation (20)
  • Explainability and model interpretation (20)

Track III: Best Interactive Visualization/Dashboard (visual-heavy)

  • Simplicity and ease of navigation (20)
  • Choice of encodings and colors (20)
  • Ease of understanding (20)
  • Impact and take-aways (20)
  • Documentation (20)


Describe the winning teams’ projects below.

The Unpredictables won the prediction model category. This group investigated the impact of governmental policies on rates of COVID-19 infections in three states with the highest number of cases at that time (California, New York, and Pennsylvania).

Curious Duo won the storytelling category. This group focused on two states, Washington and Florida, for their analysis. The objective was to identify and collect tweets from the states, and identify the sentiment trends for the state-specific user and how this impacted the spread of COVID-19.

Data visualization had two winners: 

Java’s Just Coffee visualization allows the user to interact and explore COVID-related data on the number of cases/deaths and policies on which governments have focused to counteract this pandemic. This visualization also allows the user to interact with how people have responded to COVID in the United States.

JiaLiDun did a visualization to show the effectiveness of governments’ policy responses towards the COVID-19 pandemic in different countries. This group looked at three different major categories of policies: containment and closure policies, economic policies, and health system policies. Within each category, there are different levels of stringency that were also taken into consideration.


Is there anything else that you would like to share? 

We also conducted 2 workshops:

  • Intro to NLP (natural language processing) by Grishma Jena, Data Scientist at IBM
  • Intro to Time Series by Stanislav Panev, Project Scientist at Carnegie Mellon University