[Colloquium] JOB TALK: Shay Cohen, CMU on January 28

Katie Casey caseyk at cs.uchicago.edu
Thu Jan 20 10:15:45 CST 2011


DEPARTMENT OF COMPUTER SCIENCE

UNIVERSITY OF CHICAGO

Date: Friday, January 28, 2011
Time: 2:30 p.m.
Place: Ryerson 251, 1100 E. 58th Street

----------------------------------------------

Speaker:		Shay Cohen

From:		Carnegie Mellon University

Web page:	http://www.cs.cmu.edu/~scohen/

Title: 		Unsupervised Learning of Syntax Using Probabilistic Grammars

Abstract:  	We are facing enormous growth in information available from various data
sources, especially noticeable in terms of text, such as a wealth of
untapped textual data in billions of multilingual web pages.

Since the 1990s, approaches for developing text analysis systems have been
based on learning from data annotated by humans. This annotation process
is expensive and time consuming. Another line of research, which has
recently received considerable attention, focuses on unsupervised
learning: learning from data in its raw form, without any kind of
annotation. In this talk, I will address the problem of performing
unsupervised learning of syntax for natural language. Learning natural
language syntax is situated in the core of computational linguistics, as
it is a required preprocessing step for many applications driven by
natural language.

The basic mechanism I will describe is estimation of probabilistic
grammars, a family of formalisms that enable us to describe rich models
for language learning. We will suggest a Bayesian approach to overcome
current weaknesses with estimation of probabilistic grammars. Our Bayesian
approach relies on a definition of a novel family of prior distributions,
rich enough to yield accurate estimation of probabilistic grammars. I will
then provide some analysis of the inherent computational difficulties we
face when estimating probabilistic grammars in terms of both computational
and sample complexity.

The principles I will present in this talk have promising implications for
future work which can extend beyond natural language. Fields such as
computational biology, human activity analysis, computer vision, and other
areas which requires estimation of hidden structures based on a grammar
may benefit greatly from our approach.

Host: 		John Goldsmith

Refreshments will be served following the talk at 3:30 in Ryerson 255.


More information about the Colloquium mailing list