[Colloquium] Zou/MS Presentation/Apr 30, 2019

Fri Apr 12 13:34:19 CDT 2019

This is an announcement of Chen Zou's MS Presentation.

------------------------------------------------------------------------------
Date:  Tuesday, April 30, 2019

Time:  11:30 AM

Place:  John Crerar Library 298

M.S. Candidate:  Chen Zou

M.S. Paper Title: MEMORY HIERARCHY DESIGNS FOR TILED HETEROGENEOUS
ARCHITECTURES

Abstract:
Heterogeneous architectures based on accelerators are important paths
to high performance. However, acceleration increases the performance
demands for memory hierarchy designs. In this study, our focus is
tiled heterogeneous architectures that pair accelerators with
conventional cores and tile them across the CPU chip. To understand
the memory hierarchy requirements and challenges of such tiled
heterogeneous architectures, we study a generic accelerator
architecture and different memory system configurations using a
trace-driven simulation framework with a set of high performance
computing benchmarks. We assess simulation results via performance and
area/power consumption analyses. Our results highlight the bandwidth
challenges of the memory hierarchy that tiled acceleration incurs.
Each level of the hierarchy (L1, L2, LLC) must deal with the increased
bandwidth requirement, imbalanced off-tile traffic, excessive off-tile
and off-chip bandwidth requirement respectively. Together, they can
reduce the system performance by up to 5.2x. To distill understandings
and design insights, we evaluated a set of designs that vary each
level of the memory hierarchy. We propose design insights that address
the important challenges and deliver good area and power efficiency.
For increased bandwidth requirement at L1 cache, a
high-bank-parallelism L1 cache organization is proposed. Large
tile-level shared L2 caches are recommended to not only effectively
pooling the cache capacity and off-tile bandwidth capacity but also
mitigate the off-tile and off-chip bandwidth bottlenecks. Moreover, a
balance between scaling up the tile and scaling out the tiles is
encouraged to mitigate the main memory bandwidth bottlenecks which
ultimately limit the acceleration benefits. Finally, to show the
potential benefits of a tiled heterogeneous architecture, we performed
design optimizations in a broad design spaces. Among a set of
optimized designs in a Pareto front trading performance with energy
efficiency, a tiled heterogeneous chip with 16 tiles, 12x faster
accelerator in each tile and a carefully optimized memory hierarchy
brings 3.2x performance than a homogeneous chip of 16 baseline tiles.
Acceleration alone with only baseline memory hierarchy design brings
2.2x performance. The use of the proposed high-bandwidth L1 caches
improves the performance to 2.9x, and other memory hierarchy designs
mitigating bandwidth bottlenecks at LLC and main memory brings the
system performance to 3.2x.

Chen's advisor is Prof. Andrew Chien

Login to the Computer Science Department website for details:
 https://newtraell.cs.uchicago.edu/phd/ms_announcements#chenzou

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Margaret P. Jaffey            margaret at cs.uchicago.edu
Department of Computer Science
Student Support Rep (Ry 156)               (773) 702-6011
The University of Chicago      http://www.cs.uchicago.edu
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=