[Colloquium] Talk by Tanu Malik on Friday, May 11th, 2007

Margery Ishmael marge at cs.uchicago.edu
Tue May 8 15:00:39 CDT 2007


DEPARTMENT OF COMPUTER SCIENCE - TALK

Date: Friday, May 11, 2007
Time: 2:30 p.m.
Place: Ryerson 251 (1100 E. 58th St.)

-------------------------------------------

Speaker:  TANU MALIK, Johns Hopkins University

Web page:  http://www.cs.jhu.edu/~tmalik/

Title: Large-scale Data Management for the Sciences

Traditional enterprises and novel scientific applications
are accumulating petabyte-scale datasets, which makes
the need for large-scale data management more pressing
than ever. Geographic distribution of the datasets
accompanied by complex demands on data makes large-scale
data management challenging. This is especially true for
sciences that model complex physical and biological
phenomena using data from multiple sources.

In this talk I will address two critical problems in
scientific data management: combining large number of
diverse data sources for execution of scientific queries
and executing data-intensive scientific queries efficiently,
in terms of both network and I/O, on these data sources.
I will present SkyQuery--a system that federates data from
several petabyte size, autonomous and heterogeneous astronomy
databases scattered worldwide. Using SkyQuery, scientists can
write declarative queries that compare and merge multiple
astronomical datasets. For efficient query execution and
scalability, I will present Bypass-Yield Caching--a novel
caching framework for database systems that dramatically
reduces the network bandwidth requirements of data-intensive
federations such as SkyQuery making them good network citizens.
Distributed applications such as the Bypass Yield Cache often
rely on a priori knowledge of query cardinalities to make
optimization decisions. In this context, I will present a
black-box approach to selectivity estimation that is suitable
for distributed applications.

The success of SkyQuery and its adoption by the National Virtual
Observatory is an example of data management systems enabling
scientific endeavors.

***The talk will be followed by refreshments in Ryerson 255***

-------------------------------------------------------

Host:  Ian Foster

People in need of assistance should call 773-834-8977 in advance.

For information on future CS talks: http://www.cs.uchicago.edu/events


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20070508/020ce5f4/attachment.html 


More information about the Colloquium mailing list