Talk by Joel Saltz on Wednesday, 1 November
Margery Ishmael
marge at cs.uchicago.edu
Mon Oct 23 11:38:17 CDT 2000
Department of Computer Science/The University of Chicago
Ryerson Hall -- 1100 E. 58th Street
COLLOQUIUM ANNOUNCEMENT
Wednesday, 1 November at 2:30 in Ryerson 251
(To be followed by refreshments in Ryerson 255)
Joel Saltz
Professor of Computer Science, University of Maryland
Director, Division of Informatics and Professor, Department of Pathology,
Johns Hopkins University
Title: Programming Tools for Large Dataset Subsetting, Aggregation and
Visualization
Abstract: Increasingly powerful computers, clusters and multiprocessor
machines have
enabled computational scientists and engineers to model biomedical and
physical phenomena in great detail. As a result, overwhelming amounts of
data are being generated by biomedical, scientific and engineering
simulations. In addition, large amounts of data are being generated by
sensors of various sorts such as radiological imaging devices, microscopes
as well as sensors on board satellites. The exploration and analysis of the
resulting large datasets plays an increasingly important part in many
domains of scientific research. In this presentation we describe the design
and development of software systems
designed to address the need to subset, explore, analyze, process and to
visualize large datasets.
The first software system, the Active Data Repository (ADR) targets large
disk based datasets in processing environments with multiple processors and
multiple disks. ADR is used to develop data servers that invoke
client-specified user-defined reduction functions over range query selected
portions of a distributed data structure. We will describe and characterize
methods for coordinating work, data movement and dataset tiling. The
current version of ADR is implemented as a C++ class library. A compiler
and runtime infrastructure is being developed to allow support users in the
high level specification of range queries and user-defined functions.
The second, closely related, software system is a set of middleware
infrastructure, called DataCutter, that provides support for subsetting of
datasets through multi-dimensional range
queries along with support for invoking a sequence of user-defined
filtering and aggregation functions. Processing, network and data copying
overheads are minimized by the ability
to place filtering and aggregation functions on different platforms.
DataCutter supports pipelined processing and storage allocation needed to
incrementally process and merge very large datasets.
====================================================
Margery Ishmael
Department of Computer Science
1100 E. 58th St.
Chicago, IL 60637
tel: 773 834-8977 fax: 773 702-8487
marge at cs.uchicago.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20001023/c1205dfb/attachment.htm
More information about the Colloquium
mailing list