Time Schedule:
Magdalena Balazinska
CSE 599
Seattle Campus
Studies of emerging areas and specialized topics in computer science.
Class description
Scientists today face an avalanche of data. Oceanographers generate terabytes with daily forecasts of temperature, elevation, and velocity. Astronomers acquire hundreds of millions of images from increasingly powerful telescopes. Physicists are already discussing petabyte-scale datasets collected from particle accelerators. Biologists have sequenced the Human genome, itself a large dataset, and are now describing the complex interactions between all 20,000 - 80,000 protein-encoding genes, not to mention the interactions between the proteins they encode. In all cases, scientists' ability to collect data has outpaced their ability to manage it. Complicate matters with non-standard data types, extreme performance demands, and ever-changing requirements, and you have one of the major data management challenges of today. What do these applications have in common, and why are traditional data management tools inadequate? In this course, we will investigate this question from the perspective of modern database research. We will look at what scientific datasets in different domains have in common, and what sets them apart. We will survey the literature in this area, and work with tools used in practice.
Student learning goals
General method of instruction
Recommended preparation
Class assignments and grading