[Colloquium] Dan Klein's talk next Monday, February 16, 2004

Mon Feb 9 12:16:28 CST 2004

-------------------------------------------

DEPARTMENT OF COMPUTER SCIENCE - TALK

Date: Monday, February 16, 2004
Time: 2:30 p.m.
Place: Ryerson 251

-------------------------------------------

Speaker:  DAN KLEIN, Stanford University

Url:  http://www.stanford.edu/~danklein/

Title:   Unsupervised Learning of Natural Language Syntax

ABSTRACT:
There is precisely one complete language processing system to date: the 
human brain.  Though there is debate on how much built-in bias human 
learners might have, we definitely acquire language in a primarily 
unsupervised fashion.  On the other hand, computational approaches to 
language processing are almost exclusively supervised, relying on 
hand-labeled corpora for training.  This reliance is largely due to 
repeated failures of unsupervised approaches.  In particular, the 
problem of learning syntax (grammar) from completely unannotated text 
has received a great deal of attention for well over a decade, with 
little in the way of positive results.  We argue that previous methods 
for this task have generally failed because of the representations they 
used.  Overly complex models are easily distracted by non-syntactic 
correlations (such as topical associations), while overly simple models 
aren't rich enough to capture important first-order properties of 
language (such as directionality, adjacency, and valence).  We describe 
several syntactic representations which are designed to capture the 
basic character of natural language syntax as directly as possible.  
With these representations, high-quality parses can be learned from 
surprisingly little text, with no labeled examples and no 
language-specific biases.  Our results are the first to show 
above-baseline performance in unsupervised parsing, and far exceed the 
baseline (in multiple languages).  These specific grammar learning 
methods are useful since parsed corpora exist for only a small number 
of languages.  More generally, most high-level NLP tasks, such as 
machine translation and question-answering, lack richly annotated 
corpora, making unsupervised methods extremely appealing even for 
common languages like English.

BIO:
Dan Klein is a final-year PhD student in computer science at Stanford 
University, working on unsupervised language induction and large-scale 
machine learning for NLP, including statistical parsing, information 
extraction, fast inference in large dynamic programs, and automatic 
clustering.  He holds a BA from Cornell University (summa cum laude in 
computer science, linguistics, and math) and a masters in linguistics 
from Oxford University.  Dan's academic honors include a British 
Marshall Fellowship, a Microsoft Graduate Research Fellowship, a 
Stanford Graduate Fellowship, and, most recently, the Best Paper Award 
at ACL 2003.

--------------------------------------------

Host: John Goldsmith

*Refreshments will follow the talk in Ryerson 255*

People in need of assistance should call 773-834-8977 in advance.