[Colloquium] Seminar Announcement: Faster and Cheaper: Parallelizing Large-Scale Matrix Factorization on GPUs

Wed Apr 13 08:34:00 CDT 2016

~Reminder

Computation Institute Presentation - Data Lunch Seminar (DLS)

Speaker:  Wei Tan, Research Staff Member, IBM T. J. Watson Research Center
Host:  Kyle Chard
Date:  April 13, 2016
Time: 12:00 PM - 1:00 PM 
Location: The University of Chicago, Searle 240A, 5735 S. Ellis Ave.

Title:  Faster and Cheaper: Parallelizing Large-Scale Matrix Factorization on GPUs

Abstract: Matrix factorization (MF), a technique for reducing dimensionality, is at the core of many common machine learning algorithms, e.g., collaborative filtering. MF is computationally expensive and therefore the compute time when applied to large-scale datasets is huge even when executed on high performance clusters. GPUs, with their massive number of cores and high intra-chip memory bandwidth, create new opportunities to accelerate MF. In this talk I will introduce cuMF, a CUDA-based matrix factorization library that optimizes the alternating least square (ALS) method to solve very large-scale MF. CuMF uses a set of techniques to maximize the performance on single and multiple GPUs. These techniques include smart access of sparse data by leveraging GPU memory hierarchies, using data parallelism in conjunction with model parallelism, minimizing the communication overhead among GPUs, and introducing a novel topology-aware parallel reduction scheme. With only a single machine with four Nvidia GPU cards, cuMF can be 6-10 times as fast, and 33-100 times as cost-efficient, compared with state-of-art distributed CPU solutions. Moreover, cuMF can solve the largest matrix factorization problems reported in current literature. cuMF can also be used to accelerate the ALS implementation in Spark MLlib.

A paper on CuMF is to be published at HPDC 2016. A pre-print is available at http://arxiv.org/abs/1603.03820. 

Bio: Wei Tan is a Research Staff Member at IBM T. J. Watson Research Center. His research interest includes big data, distributed systems, NoSQL and services computing. Currently he focuses on accelerating machine learning algorithms using scale-up (e.g., GPU) and scale-out (e.g., Spark) approaches. His work has been incorporated into IBM patent portfolio and software products such as BigInsights and Cognos. Before joining IBM he worked at the Computation Institute, an joint institute of the University of Chicago and Argonne National Laboratory, on scientific workflows. For more information, seehttp://researcher.ibm.com/researcher/view.php?person=us-wtan.

Information:  Lunch will be provided

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20160413/e400bc88/attachment.htm