[Colloquium] Reminder - Ivy Wang MS Presentation/May 29, 2024

Megan Woodward via Colloquium colloquium at mailman.cs.uchicago.edu
Wed May 29 08:00:00 CDT 2024


This is an announcement of Ivy Wang's MS Presentation
===============================================
Candidate: Ivy Wang

Date: Wednesday, May 29, 2024

Time: 10 am CT

Location: JCL 298

Remote Location:  https://uchicago.zoom.us/j/95377837272?pwd=VkpzblQweFJFa2dQOS80TytPSmlodz09
Meeting ID: 953 7783 7272
Passcode: 701039

Title: Organize Fine-grained Parallelism Using Keys At Scale

Abstract: Rapidly proliferating machine learning and graph processing applications, demand high-
performance on petascale datasets. Achieving this performance requires efficient exploitation
of irregular parallelism, as their sophisticated structures and real-world data produce com-
putations with extreme irregularity (e.g., million-fold skew). The need to exploit large-scale
parallel hardware (million-fold parallelism) is a further challenge.
      Programming irregular data and parallelism using existing models (e.g., MPI) is difficult
because they couple naming, data mapping, and computation mapping. Further, they only
exploit coarse-grained parallelism. To solve this problem, we present a key-based program-
ming model, called key-value map-shuffle-reduce (KVMSR). The model enables programmers
to express fine-grained parallelism across programmer-defined key-value sets. The parallel
computation can then be optimized using KVMSR’s modular control for load balance and
data locality. KVMSR achieves this by expressing parallelism with respect to a global address
space and providing modular control to flexibly bind computation to compute resources.
      We define the KVMSR model and illustrate it with three programs, convolution filter,
PageRank and BFS, to show its ability to separate computation expression from binding to
computation location for high performance. We evaluate KVMSR on a novel fine-grained
parallel architecture, called UpDown, supporting up to 4 billion fold hardware parallelism
in the full system design. On an 8,192-way parallel compute system, KVMSR modular
computation location control achieves up to 9,202x performance with static approaches and
an increase of 3,136x to 4,258x speedup with dynamic approaches for computation location
binding comparing to the single-thread CPU programs.


Advisors: Andrew Chien

Committee Members: Andrew Chien, John Reppy, Haryadi Gunawi, and David Gleich






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20240529/91f8983b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Ivy_Wang_Masters_Thesis.pdf
Type: application/pdf
Size: 1463880 bytes
Desc: Ivy_Wang_Masters_Thesis.pdf
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20240529/91f8983b/attachment-0001.pdf>


More information about the Colloquium mailing list