[Colloquium] Bresnahan/MS Presentation/Dec. 13, 2006

Margaret Jaffey margaret at cs.uchicago.edu
Mon Nov 27 10:48:53 CST 2006


This is an announcement of John Bresnahan's MS Presentation.

---------------
Date:  Wednesday, December 13, 2006

Time:  10:00 a.m.

Place:  Ryerson 276

M.S. Candidate:  John Bresnahan

M.S. Paper Title:  An Architecture for Dynamic Allocation of Compute  
Cluster Bandwidth

Abstract:
Modern high-performance computers are often implemented as clusters.  
A cluster is comprised of several machines, each representing a  
compute node. Local resource managers are used to grant access to  
these nodes. Once access is granted to a node, a user has exclusive  
access to the entire node including its local filesystem and network  
interface. In addition to serving as a computational cluster, these  
sites are often highly connected to a network: for example, sites  
participating in the US TeraGrid system have between ten and forty  
gigabits per second of available bandwidth and each computational  
node in a TeraGrid cluster typically has a one gigabit per second  
network interface. However, since it is a compute cluster, jobs are  
often computationally bound, in which case most of this bandwidth is  
unused.

In order to stage in and out data sets, or to publish computational  
results, sites provide file transfer services. Achieving peak  
transfer speeds requires a significant number of nodes. Packet  
switching at top speeds put a load on a CPU that prohibits locating  
user compute jobs and site transfer processes on the same node. Thus,  
administrators partition their resources into compute nodes and  
transfer nodes. Due to fluctuating transfer requirements and the  
desire to avoid idle resources it is difficult to determine this  
partition statically. Ideally, the partitioning could be adjusted  
dynamically to suit immediate needs.

This paper studies this dynamic resource partitioning problem. We  
propose an architecture that allows transfer nodes to be acquired  
from the computational queue when needed, and returned when no longer  
required. The site administrator has a means to set a policy  
regarding how long a transfer node can be idle before being returned,  
and when to request additional nodes. We measure the costs associated  
with a prototype implementation of this architecture and study the  
impact of different policies on achieved performance.

Advisor:  Prof. Ian Foster

A draft copy of John Bresnahan's MS Paper is available in Ry 161A.


=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Margaret P. Jaffey                             margaret at cs.uchicago.edu
Department of Computer Science
Student Support Rep (Ry 161A)        (773) 702-6011
The University of Chicago                  http://www.cs.uchicago.edu
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=




More information about the Colloquium mailing list