Talk by Joel Saltz on Wednesday, 1 November

Mon Oct 23 11:38:17 CDT 2000

Department of Computer Science/The University of Chicago

Ryerson Hall -- 1100 E. 58th Street

COLLOQUIUM ANNOUNCEMENT

Wednesday, 1 November at 2:30 in Ryerson 251
(To be followed by refreshments in Ryerson 255)

Joel Saltz
Professor of Computer Science, University of Maryland
Director, Division of Informatics and Professor, Department of Pathology,
Johns Hopkins University

Title: Programming Tools for Large Dataset Subsetting, Aggregation and
Visualization

Abstract: Increasingly powerful computers, clusters and multiprocessor 
machines have
enabled computational scientists and engineers to model biomedical and 
physical phenomena in great detail. As a result, overwhelming amounts of 
data are being generated by biomedical, scientific and engineering 
simulations. In addition, large amounts of data are being generated by 
sensors of various sorts such as radiological imaging devices, microscopes 
as well as sensors on board satellites. The exploration and analysis of the
resulting large datasets plays an increasingly important part in many 
domains of scientific research. In this presentation we describe the design 
and development of software systems
designed to address the need to subset, explore, analyze, process and to 
visualize large datasets.

The first software system, the Active Data Repository (ADR) targets large 
disk based datasets in processing environments with multiple processors and 
multiple disks. ADR is used to develop data servers that invoke 
client-specified user-defined reduction functions over range query selected 
portions of a distributed data structure. We will describe and characterize 
methods for coordinating work, data movement and dataset tiling. The 
current version of ADR is implemented as a C++ class library. A compiler 
and runtime infrastructure is being developed to allow support users in the 
high level specification of range queries and user-defined functions.

The second, closely related, software system is a set of middleware 
infrastructure, called DataCutter, that provides support for subsetting of 
datasets through multi-dimensional range
queries along with support for invoking a sequence of user-defined 
filtering and aggregation functions. Processing, network and data copying 
overheads are minimized by the ability
to place filtering and aggregation functions on different platforms.

DataCutter supports pipelined processing and storage allocation needed to 
incrementally process and merge very large datasets.

====================================================
Margery Ishmael
Department of Computer Science
1100 E. 58th St.
Chicago, IL 60637

tel: 773 834-8977  fax: 773 702-8487

marge at cs.uchicago.edu 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20001023/c1205dfb/attachment.htm