[Colloquium] Reminder: Balkir/Dissertation Defense/Nov 12, 2013

Margaret Jaffey margaret at cs.uchicago.edu
Mon Nov 11 10:08:33 CST 2013


This is a reminder about Soner's defense tomorrow.

       Department of Computer Science/The University of Chicago

                     *** Dissertation Defense ***


Candidate:  Atilla Soner Balkir

Date:  Tuesday, November 12, 2013

Time:  10:00 AM

Place:  Searle 240a

Title: APPROXIMATING GEODESIC DISTANCE AND GRAPH CENTRALITY ON SHARED
NOTHING ARCHITECTURES

Abstract:
This thesis presents a parallel toolkit for pairwise distance
computation in massive networks. Computing the exact shortest paths
between a large number of vertices is a costly operation and serial
algorithms are not practical for billion-scale graphs. We first
describe an efficient parallel method to solve the single source
shortest path problem on commodity hardware with no shared memory.
Given a source vertex s, the traditional solutions to this problem
typically return one shortest path for each (s, v) pair in the graph
where v ∈ V . We slightly modify the original problem definition
and extract multiple shortest paths between each (s, v) pair. An
important characteristic of large real-world graphs is the skewed
degree distribution where the fraction p(k) of vertices with degree k
asymptotically follows a power law. In parallel frameworks, the skewed
degree distribution often results in straggling tasks and
under-utilization of the computing hardware. A large number of idle
computers wait for a small number of busy ones that consume a lot of
CPU cycles while processing a few vertices with very high degree. The
presented algorithm for solving the single source shortest path
problem includes a load balancing technique to deal with the skewed
degree distribution in large graphs. We also describe two novel
optimization methods called Selective Push and Incremental Graph
Serialization to avoid generating and exchanging redundant messages
between compute nodes while minimizing the amount of data shuffled
through the network.

Using the output of the first algorithm as a building block, we
introduce a new parallel algorithm to estimate shortest paths between
arbitrary pairs of vertices. Our method exploits data locality,
produces highly accurate results and allows batch computation of
shortest paths with 7% average error in graphs that contain billions
of edges. The proposed algorithm is up to two orders of magnitude
faster than previously suggested algorithms on a single computer and
it does not require large amounts of memory or expensive high-end
servers. Furthermore, this algorithm runs in parallel where each
compute node can work on a different subset of vertex pairs
independently. We present alternative parallel implementations of this
method and discuss their efficiency using a communication cost model.
To the best of our knowledge, this is the first algorithm that can
approximate pairwise distance in large batches based on an
embarrassingly parallel scheme.

We finally look at alternative methods to approximate the closeness
and betweenness centrality metrics, which involve systems challenges
dealing with indexing, joining and comparing large datasets
efficiently. Exact computation of these measures involves solving the
all pairs shortest path problem. Existing parallel algorithms for
approximating closeness and betweenness centrality are designed for
high-end shared memory symmetric multiprocessor architectures. Graph
size is still a limiting factor as these algorithms are memory
intensive, and they do not exhibit high degrees of cache and memory
locality. There is also the hardware cost and availability issue. Most
network scientists do not have direct access to super computers or
high-end shared memory multiprocessor architectures. On the other
hand, commodity clusters and shared nothing architectures with
independent compute nodes are becoming widely available. They are also
offered through cloud computing vendors in the form of infrastructure
as a service. We use the parallel distance estimation algorithm to
approximate the closeness and betweenness centrality metrics in large
graphs. In one experiment, we mined a real-world web graph with 700
million nodes and 12 billion edges to identify the most central
vertices and calculated more than 63 billion shortest paths in 6 hours
on a 20-node commodity cluster.



Atilla Soner's advisor is Prof. Ian Foster

Login to the Computer Science Department website for details,
including a draft copy of the dissertation:

 https://www.cs.uchicago.edu/phd/phd_announcements#soner

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Margaret P. Jaffey            margaret at cs.uchicago.edu
Department of Computer Science
Student Support Rep (Ry 156)               (773) 702-6011
The University of Chicago      http://www.cs.uchicago.edu
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


More information about the Colloquium mailing list