[Colloquium] Dan Klein's talk next Monday, February 16, 2004
Margery Ishmael
marge at cs.uchicago.edu
Mon Feb 9 12:16:28 CST 2004
-------------------------------------------
DEPARTMENT OF COMPUTER SCIENCE - TALK
Date: Monday, February 16, 2004
Time: 2:30 p.m.
Place: Ryerson 251
-------------------------------------------
Speaker: DAN KLEIN, Stanford University
Url: http://www.stanford.edu/~danklein/
Title: Unsupervised Learning of Natural Language Syntax
ABSTRACT:
There is precisely one complete language processing system to date: the
human brain. Though there is debate on how much built-in bias human
learners might have, we definitely acquire language in a primarily
unsupervised fashion. On the other hand, computational approaches to
language processing are almost exclusively supervised, relying on
hand-labeled corpora for training. This reliance is largely due to
repeated failures of unsupervised approaches. In particular, the
problem of learning syntax (grammar) from completely unannotated text
has received a great deal of attention for well over a decade, with
little in the way of positive results. We argue that previous methods
for this task have generally failed because of the representations they
used. Overly complex models are easily distracted by non-syntactic
correlations (such as topical associations), while overly simple models
aren't rich enough to capture important first-order properties of
language (such as directionality, adjacency, and valence). We describe
several syntactic representations which are designed to capture the
basic character of natural language syntax as directly as possible.
With these representations, high-quality parses can be learned from
surprisingly little text, with no labeled examples and no
language-specific biases. Our results are the first to show
above-baseline performance in unsupervised parsing, and far exceed the
baseline (in multiple languages). These specific grammar learning
methods are useful since parsed corpora exist for only a small number
of languages. More generally, most high-level NLP tasks, such as
machine translation and question-answering, lack richly annotated
corpora, making unsupervised methods extremely appealing even for
common languages like English.
BIO:
Dan Klein is a final-year PhD student in computer science at Stanford
University, working on unsupervised language induction and large-scale
machine learning for NLP, including statistical parsing, information
extraction, fast inference in large dynamic programs, and automatic
clustering. He holds a BA from Cornell University (summa cum laude in
computer science, linguistics, and math) and a masters in linguistics
from Oxford University. Dan's academic honors include a British
Marshall Fellowship, a Microsoft Graduate Research Fellowship, a
Stanford Graduate Fellowship, and, most recently, the Best Paper Award
at ACL 2003.
--------------------------------------------
Host: John Goldsmith
*Refreshments will follow the talk in Ryerson 255*
People in need of assistance should call 773-834-8977 in advance.
More information about the Colloquium
mailing list