[Colloquium] Reminder: Bodzsar/MS Presentation/May 9, 2013

Wed May 8 10:09:49 CDT 2013

This is a reminder about Erik's MS Presentation tomorrow.

------------------------------------------------------------------------------
Date:  Thursday, May 9, 2013

Time:  12:00 PM

Place:  Ryerson 352 (the "Barn")

M.S. Candidate:  Erik Bodzsar

M.S. Paper Title: Limitations of Data Reuse in Streaming Iterative
Algorithms

Abstract:
It is well-known that the MapReduce programming model is not
expressive enough for many applications. However, more expressive big
data computation systems are in-memory and therefore have limited
scalability. We propose a scale up model for big data processing,
using SSDs to eliminate the memory limitation and work on data sets
bigger than memory.

We explore how excess parallelism can be used by an out-of-core
computation system to decrease I/O. We focus on iterative streaming
algorithms and how computation reordering (based on knowledge about
parallelism) can be used to reduce their total I/O and therefore
running time in an out-of-core setting.

We propose two task schedulers that exploit the data reuse of
streaming iterative algorithms. One exploits knowledge about memory
contents to greedily execute tasks that require the least amount of
I/O. The other exploits iterative algorithm structure by reversing
execution order in every iteration to maximize data reuse between
iterations. We evaluate the proposed schedulers using Blockus, a
single-machine system that performs transparent out-of-core
computation. Blockus is built on top of Presto, a parallel programming
model and distributed execution engine for R.

The proposed schedulers can achieve 20-50% speedups over naive
schedulers for simple iterative streaming algorithms on data sets that
are 2-5 times bigger than memory size. However, the proposed methods
do not scale to data sizes orders of magnitude bigger than memory
size, because the speedup is roughly inversely proportional to the
size of the data set. This means that computation reordering is an
ineffective scale-up technique for streaming iterative algorithms; the
excess parallelism and data reuse cannot be exploited to scale up to
data set sizes that are multiple times bigger than memory.

Erik's advisor is Prof. Andrew Chien

Login to the Computer Science Department website for details:
 https://www.cs.uchicago.edu/phd/ms_announcements#erikb

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Margaret P. Jaffey            margaret at cs.uchicago.edu
Department of Computer Science
Student Support Rep (Ry 156)               (773) 702-6011
The University of Chicago      http://www.cs.uchicago.edu
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=