[Colloquium] A Computer Science Seminar

Donna Brooms donna at cs.uchicago.edu
Thu Oct 17 05:27:33 CDT 2013


*REMINDER*

COMPUTER SCIENCE
The University of Chicago
_____________________

Thursday, October 17th, 2013
Ryerson 251 @ 2:00 p.m.

Franck Cappello
Argonne National Laboratory
www.mcs.anl.gov/person/franck-cappello
 
Title:  “On the road to exascale resilience: misconceptions, plans, and bumps”
 
Abstract:
Exascale systems are 7 years away. Since 2007, meetings and workshops have explored the main challenges to making exascale feasible. Among them, resilience (or fault tolerance) is considered as one of the most critical. Several projections were made announcing alarming situations. After 5 years of progress, however, resilience for exascale systems is a more mature problem. Thanks to several road-mapping efforts and to technological advances, the community has gained a much better understanding of the situation, and the research issues have been clarified.
 
In this talk, we will discuss the exascale resilience problem as presented in the most recent reports published by the community. We will look at issues that are likely to be solved before exascale, such as proportional fault tolerance for fail stop errors. Then we will examine challenging problems such as silent data corruption detection and failure prediction. Additionally, we will discuss resilient supporting software.
 
To date, most of the attention has focused on protecting application execution. We now understand that supporting software (runtime, notification and monitoring infrastructure, etc.) also needs to be resilient. This situation opens up exciting research opportunities in the domain of high-performant, scalable, and resilient distributed systems.
 
Host: Andrew Chien
 
Followed by refreshments in Ryerson 255 @ 3 pm

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20131017/0ce32dc6/attachment.htm 


More information about the Colloquium mailing list