[Colloquium] Revised Talk by Chi-Keung Luk, Intel Corporation on Friday April 4, 2008

Wed Mar 26 14:25:35 CDT 2008

DEPARTMENT OF COMPUTER SCIENCE

UNIVERSITY OF CHICAGO

Date: Friday, April 4, 2008
Time: 2:30 p.m.
Place: Ryerson 251, 1100 E. 58th Street

----------------------------------------------------------

Speaker:	Chi-Keung Luk

From:		Intel Corporation

Web page:	http://www.ckluk.org/ck/

Title: Exploiting Multicore Parallelism with Dynamic Instrumentation  
and Compilation

Abstract: The emerging multicore era has brought many opportunities  
and challenges to systems research. Two of the challenges I have been  
focusing on are (i) how to provide detailed analysis of parallel  
programs and (ii) how to map computations in a parallel program to the  
underlying hardware in order to achieve the optimal performance.

For (i), we have developed the Pin dynamic instrumentation system,  
which has become very popular for writing architectural and program  
analysis tools. By inserting instrumentation codes on the fly, Pin can  
perform fine-grain monitoring of the architectural state of a program.  
As an example, I will discuss a parallel programming tool called  
Thread Checker which we built with Pin for detecting common parallel  
programming bugs like data races and deadlocks. I will also discuss  
the dynamic compilation techniques behind Pin. In addition, I will  
present an extension of Pin called PinOS, which performs whole-system  
instrumentation (i.e. including both OS and applications) by using  
virtualization techniques.

For (ii), I have developed the Qilin parallel programming system,  
which exploits the hardware parallelism available on machines with a  
multicore CPU and a GPU. Qilin provides a C++ API for writing data- 
parallel operations so that the compiler is alleviated from the  
difficult job of extracting parallelism from serial code. At runtime,  
the Qilin compiler automatically partitions these API calls into tasks  
and maps these tasks to the underlying hardware using an adaptive  
algorithm. Preliminary results show that our parallel system can  
achieve significant speedups (above 10x) over the serial case for some  
important computation kernels.

At the end, I will outline my future works in parallel programming,  
compilation, and virtualization.
---------------------------------------------------------

Host:	John Reppy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20080326/9e5d234f/attachment.html