[Colloquium] TTI-C Talk: Kamalika Chaudhuri, UCSD

Julia MacGlashan macglashan at tti-c.org
Thu Oct 16 08:48:12 CDT 2008


When:              TODAY: Thursday, October 16 @ 2:00pm

Where:            TTI-C Conference Room: 1427 E. 60th St, 2nd Floor

Who:                Kamalika Chaudhuri, University of California, San Diego

Title:                Learning Mixtures of Distributions through Spectral
Algorithms


Clustering, a method of finding structure in unlabelled data by grouping
the data points into few groups based on a similarity measure, has many
applications in AI, Physics and Biology. A simple theoretical model
that captures clustering is the problem of learning mixtures of
distributions. In this setting, one is given sample points generated
from a mixture of T distributions of a certain type, and the goal is
to recover these distributions and classify the points correctly. In
this talk, motivated by applications in biology, I will focus on learning
mixtures of product distributions.

The most common method in practice is uses principal component
analysis(PCA) as a preprocessing step to find the T-dimensional
subspace that contains the T centers. While this has been analysed
theoretically, it is known to be ineffective in certain situations --
namely, when the proportion of different distributions in the mixture
is too skewed, or when the variance in irrelevant directions is too
high. In the first part of the talk, we present a simple method which
simultaneously exploits the correlation between the signal coordinates
and independence between the noise coordinates to effectively separate
the centers of the distributions.  Our method performs better than
PCA-based algorithms when learning mixtures of binary product
distributions and axis-aligned Gaussians.

In the second part of the talk, motivated again by our application in
biology, we address the sample complexity of learning mixtures of
distributions. We present a simple and efficient algorithm that learns
mixtures of two binary product distributions with low sample
complexity.

[Based on joint work with Satish Rao, Eran Halperin and Shuheng Zhou.]



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20081016/32d668b2/attachment.htm 


More information about the Colloquium mailing list