[Colloquium] Talk #1 by Leonid Peshkin (Monday, July 14th) at TTI

Tue Jul 8 13:35:27 CDT 2003

--------------------------------------------------------------------------------
-------- 

TOYOTA TECHNOLOGICAL INSTITUTE 
--------------------------------------------------------------------------------
-------- 

Date: Monday, July 14th, 2003

Time: 3:30 p.m.

Place: Toyota Technological Institute conference room (The Press Building - 
1427 E. 60th St.)

Speaker: Leonid Peshkin    
Harvard University

Title: "Dynamic Bayesian Nets for Language Modeling"

Abstract: 
Statistical methods in NLP exclude linguistically plausible models due to the 
prohibitive complexity of inference in such models. Dynamic Bayesian networks 
(DBNs) offer an elegant way to integrate various aspects of language in one 
model. Many existing algorithms developed for learning and inference in DBNs 
are applicable to probabilistic language modeling. In particular, a recent leap 
in approximate inference algorithms enables inference in rich probabilistic 
models. To demonstrate the potential of DBNs for natural language processing, 
we employ a DBN for information extraction and part-of-speech tagging tasks. 
Our methods outperform previously published results on an established benchmark 
domain. 

This talk will overview the following papers, available from 
 http://www.ai.mit.edu/~pesha/Public/papers.html 

 "Why Build Another Part-Of-Speech Tagger?" (In review) 
  http://www.ai.mit.edu/~pesha/Public/peshkin.pdf 

- How simple a PoS tagger could we make? 
- How could it be trained independently, then integrated into a system? 
- Is the key to PoS tagging in the features or in the model after all? 
- Do linguistic features really help? 

 "Bayesian Nets in Syntactic Categorization of Novel Words" NAACL-HLT 2003 
 http://www.ai.mit.edu/~pesha/Public/naacl03.pdf 

- Our PoS tagger fares well on novel data, trained on WSJ, 
 tested on Brown corpus, email corpus and even "Jabberwocky". 

  "Bayesian Information Extraction Network" - IJCAI - 2003 
 http://www.eecs.harvard.edu/~pesha/Public/BIEN.pdf 

- We assemble wealth of emerging linguistic instruments for shallow parsing, 
syntactic and semantic tagging, morphological decomposition, named entity 
recognition etc. in order to incrementally build a robust information 
extraction system.

 *Refreshments will be served after the talk 

If you wish to meet with the speaker, please send e-mail to Meridel 
(mtrimble at tti-c.org)