[Colloquium] Re: REMINDER: 1/25 Talks at TTIC: Ilya Razenshteyn, MIT

Wed Jan 25 10:31:12 CST 2017

When:     Wednesday, January 25th at 11:00 am

Where:    TTIC, 6045 S Kenwood Avenue, 5th Floor, Room 526

Who:       Ilya Razenshteyn, MIT: CSAIL

Title:        New Algorithms for High-Dimensional Data

Abstract: A popular approach in data analysis is to represent a dataset in
a high-dimensional feature vector space, and reduce a given task to a
geometric computational problem. As it turns out, the running times of most
of the classic geometric algorithms suffer from an exponential dependence
on the dimension and are typically not applicable to the high-dimensional
regime. This necessitates the development of new algorithmic approaches
that overcome this "curse of dimensionality". In this talk I will give an
overview of my work in this area. In particular:

* I will describe new algorithms for the high-dimensional Nearest Neighbor
Search (NNS) problem.  Our algorithms improve, for the first time, upon the
popular Locality-Sensitive Hashing (LSH) approach. The main insight behind
the new algorithms is the use of hash functions that are *data dependent*,
i.e., , that are carefully tailored to a given dataset. In contrast, prior
algorithms used generic, data-oblivious hash families.

* Next, I will outline how to make the core component of the above
algorithms---the optimal LSH for cosine distance---practical. This yields
an implementation of LSH which is competitive with the state of the art
heuristics for the NNS problem.

* Finally, I will show how to determine the limits of distance-preserving
data sketching, i.e., vector compression methods that approximately
preserve the distances between vectors. In particular, I will show the
first non-trivial lower bound for the sketching complexity of the
well-studied Earth Mover Distance.

The common theme that unifies these and other algorithms for
high-dimensional data is the use of "efficient data representations":
randomized hashing, sketching, dimensionality reduction, metric embeddings,
and others. One goal of the talk is to describe a holistic view of all of
these techniques.

Host: Yury Makarychev <yury at ttic.edu>

Mary C. Marre
Administrative Assistant
*Toyota Technological Institute*
*6045 S. Kenwood Avenue*
*Room 504*
*Chicago, IL  60637*
*p:(773) 834-1757*
*f: (773) 357-6970*
*mmarre at ttic.edu <mmarre at ttic.edu>*

On Tue, Jan 24, 2017 at 2:14 PM, Mary Marre <mmarre at ttic.edu> wrote:

> When:     Wednesday, January 25th at 11:00 am
>
> Where:    TTIC, 6045 S Kenwood Avenue, 5th Floor, Room 526
>
> Who:       Ilya Razenshteyn, MIT: CSAIL
>
>
> Title:        New Algorithms for High-Dimensional Data
>
> Abstract: A popular approach in data analysis is to represent a dataset
> in a high-dimensional feature vector space, and reduce a given task to a
> geometric computational problem. As it turns out, the running times of most
> of the classic geometric algorithms suffer from an exponential dependence
> on the dimension and are typically not applicable to the high-dimensional
> regime. This necessitates the development of new algorithmic approaches
> that overcome this "curse of dimensionality". In this talk I will give an
> overview of my work in this area. In particular:
>
> * I will describe new algorithms for the high-dimensional Nearest Neighbor
> Search (NNS) problem.  Our algorithms improve, for the first time, upon the
> popular Locality-Sensitive Hashing (LSH) approach. The main insight behind
> the new algorithms is the use of hash functions that are *data dependent*,
> i.e., , that are carefully tailored to a given dataset. In contrast, prior
> algorithms used generic, data-oblivious hash families.
>
> * Next, I will outline how to make the core component of the above
> algorithms---the optimal LSH for cosine distance---practical. This yields
> an implementation of LSH which is competitive with the state of the art
> heuristics for the NNS problem.
>
> * Finally, I will show how to determine the limits of distance-preserving
> data sketching, i.e., vector compression methods that approximately
> preserve the distances between vectors. In particular, I will show the
> first non-trivial lower bound for the sketching complexity of the
> well-studied Earth Mover Distance.
>
> The common theme that unifies these and other algorithms for
> high-dimensional data is the use of "efficient data representations":
> randomized hashing, sketching, dimensionality reduction, metric embeddings,
> and others. One goal of the talk is to describe a holistic view of all of
> these techniques.
>
>
> Host: Yury Makarychev <yury at ttic.edu>
>
>
>
>
> Mary C. Marre
> Administrative Assistant
> *Toyota Technological Institute*
> *6045 S. Kenwood Avenue*
> *Room 504*
> *Chicago, IL  60637*
> *p:(773) 834-1757 <(773)%20834-1757>*
> *f: (773) 357-6970 <(773)%20357-6970>*
> *mmarre at ttic.edu <mmarre at ttic.edu>*
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20170125/37e3cedb/attachment-0001.html>