[Colloquium] Talk by Nathan Srebro, TTI-C on November 3, 2008
Katie Casey
caseyk at cs.uchicago.edu
Wed Oct 22 13:51:15 CDT 2008
DEPARTMENT OF COMPUTER SCIENCE
UNIVERSITY OF CHICAGO
Date: Monday, November 3, 2008
Time: 3:45 p.m.
Place: RY 251
----------------------------------------------------------
Speaker: Nathan Srebro
From: TTI-Chicago
Web page: http://ttic.uchicago.edu/~nati/
Title: On the Informational Cost of Tractability
Abstract: As with many other machine learning problems, most
formulations of clustering correspond to non-convex, NP-hard,
optimization problems. Yet, when the data really is clustered, and
enough data is available, local search and other heuristics tend to be
able to recover the clustering. At an extreme regime, we can even
prove the clustering is tractably recoverable. On the other extreme,
if the data is not well clustered, or not enough data is available,
the "optimal" clustering, although hard to find, is meaningless. This
leads to the common wisdom that "clustering isn't hard: its either
easy, or not interesting". Is it in-fact the case that when a
meaningful clustering is statistically recoverable, it is also easy to
find computationally? If so, we certainly do not have a rigorous
understanding of why this is the case. Or, is there a regime in which
the clustering is statistically recoverable, but not tractably
recoverable? E.g., might we need more samples in order to ensure that
our recovery methods work, beyond what is statistically necessary if
we had infinite computational power? Can we quantify this increase in
the required sample size due to our computational limitations? In this
talk I will discuss the above issues, mostly in the context of
clustering data from a mixture of Gaussians, but also in some other
machine learning problems. I will also mention other ways in which
increasing the size of the data reduces, rather than increases, the
amount of required computation, contrary to the standard approach of
studying the complexity of computation as increasing with the size of
the data.
More information about the Colloquium
mailing list