<html>


<head>


<meta http-equiv="Content-Type" content="text/html; charset=utf-8">


</head>


<body>


<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">


<div class="PlainText">This is an announcement of Ivy Wang's MS Presentation<br>


===============================================<br>


Candidate: Ivy Wang<br>


<br>


Date: Wednesday, May 29, 2024<br>


<br>


Time: 10 am CT<br>


<br>


Location: JCL 298<br>


<br>


Title: Organize Fine-grained Parallelism Using Keys At Scale<br>


<br>


Abstract: Rapidly proliferating machine learning and graph processing applications, demand high-<br>


performance on petascale datasets. Achieving this performance requires efficient exploitation<br>


of irregular parallelism, as their sophisticated structures and real-world data produce com-<br>


putations with extreme irregularity (e.g., million-fold skew). The need to exploit large-scale<br>


parallel hardware (million-fold parallelism) is a further challenge.<br>


      Programming irregular data and parallelism using existing models (e.g., MPI) is difficult<br>


because they couple naming, data mapping, and computation mapping. Further, they only<br>


exploit coarse-grained parallelism. To solve this problem, we present a key-based program-<br>


ming model, called key-value map-shuffle-reduce (KVMSR). The model enables programmers<br>


to express fine-grained parallelism across programmer-defined key-value sets. The parallel<br>


computation can then be optimized using KVMSR’s modular control for load balance and<br>


data locality. KVMSR achieves this by expressing parallelism with respect to a global address<br>


space and providing modular control to flexibly bind computation to compute resources.<br>


      We define the KVMSR model and illustrate it with three programs, convolution filter,<br>


PageRank and BFS, to show its ability to separate computation expression from binding to<br>


computation location for high performance. We evaluate KVMSR on a novel fine-grained<br>


parallel architecture, called UpDown, supporting up to 4 billion fold hardware parallelism<br>


in the full system design. On an 8,192-way parallel compute system, KVMSR modular<br>


computation location control achieves up to 9,202x performance with static approaches and<br>


an increase of 3,136x to 4,258x speedup with dynamic approaches for computation location<br>


binding comparing to the single-thread CPU programs.<br>


<br>


<br>


Advisors: Andrew Chien<br>


<br>


Committee Members: Andrew Chien, John Reppy, Haryadi Gunawi, and David Gleich<br>


<br>


<br>


<br>


</div>


</span></font></div>


<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">


<div class="PlainText"><br>


<br>


<br>


</div>


</span></font></div>


</body>


</html>