[Colloquium] Wenyi Wang MS Presentation/Mar 13, 2025

via Colloquium colloquium at mailman.cs.uchicago.edu
Thu Mar 6 15:21:05 CST 2025


This is an announcement of Wenyi Wang's MS Presentation
===============================================
Candidate: Wenyi Wang

Date: Thursday, March 13, 2025

Time:  3 pm CST

Location: JCL 298

Remote Location: https://www.google.com/url?q=https://urldefense.com/v3/__https://uchicago.zoom.us/j/92076164123?pwd%3DI5njfvu6d24TvWbNCNUCEmHQFVnD1K.1__;!!BpyFHLRN4TMTrA!7F1McmH2NTNkN_OPmbXRaBk9U9eLxJnfG1stcLotlLdVr-ey0N4n58sseleGmZnGB-9UVRdFTN_rgZzNkm6G$&sa=D&source=calendar&ust=1739907498356565&usg=AOvVaw2YGu8aqjlZLXo2KwGibCTM

Title: Optimizing Fine-Grained Parallelism Through Dynamic Load Balancing on Multi-Socket Many-Core Systems

Abstract: Achieving efficient task parallelism on many-core architectures is an important challenge. The widely used GNU OpenMP implementation of the popular OpenMP parallel programming model incurs high overhead for fine-grained, short-running tasks due to time spent on runtime synchronization. In this work, we introduce and analyze three key advances that collectively achieve significant performance gains. First, we introduce XQueue, a lock-less concurrent queue implementation to replace GNU's priority task queue and remove the global task lock. Second, we develop a scalable, efficient, and hybrid lock-free/lock-less distributed tree barrier to address the high hardware synchronization overhead from GNU's centralized barrier. Third, we develop two lock-less and NUMA-aware load balancing strategies. We evaluate our implementation using Barcelona OpenMP Task Suite (BOTS) benchmarks. Results from the first and second advances demonstrate up to 1522.8× performance improvement compared to the original GNU OpenMP. Further improvements from lock-less load balancing show up to 4× improvement compared to GNU OpenMP using XQueue. Through a rich set of profiling and instrumentation tools, we are able to investigate the runtime behavior of GNU OpenMP and improve its performance on fine-grained tasks by many orders of magnitude.

Advisor: Ian Foster and Kyle Chard

Committee: Kyle Chard, Ioan Raicu, Ian Foster


More information about the Colloquium mailing list