[Colloquium] Computer Science Seminar - March 5, 2014

Sandra Wallace swallace at cs.uchicago.edu
Thu Feb 27 12:05:38 CST 2014


Jennie Duggan
Massachusetts Institute of Technology (MIT)

Date:  Wednesday, March 5, 2014

Time:  2:30 PM

Place: Ryerson 251
 
Title:	  “Large-Scale Array Data Management for Science Applications”
 

Abstract: 
Science applications are becoming increasingly data-driven. Researchers are collecting new data at an unprecedented scale, and much of it is stored in multidimensional arrays. Such workloads consist of complex transformations, many of which query the data spatially. The established relational model of data management cannot support this new class of applications. At the same time, scientists are increasingly conducting their experiments on large, shared-nothing clusters in lieu of purpose built platforms. As a result, processor time is becoming more plentiful and network bandwidth is the scarcer resource.
 
In this talk, I will describe my research on efficiently distributing arrays for scientific workloads. This work is done in the context of SciDB, an open source array database system built for applications with complex analytics. I will first present our optimization of data-intensive queries to minimize their use of network resources. Our approach uses integer linear programming to assign segments of a distributed query to individual database nodes. The second part of my talk will present research on data placement for elastic array databases. This partitioning minimizes the time needed to reorganize the database for a change in the hardware configuration, while optimizing the layout of multidimensional data structures for spatial queries.

Bio:
Jennie Duggan is a postdoctoral associate at the Massachusetts Institute of Technology where she works with Michael Stonebraker. She received her Ph.D. from Brown University in December 2012 under the guidance of Ugur Cetintemel. Her research interests include scientific data management, database workload modeling, and cloud computing. She is especially focused on making data-driven science applications fast and scalable.

Host:  Prof. Ian Foster
 

*Refreshments will be served after the talk at 3:30 pm in Ryerson 255*


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20140227/fba3bbba/attachment.htm 


More information about the Colloquium mailing list