[Colloquium] Michael Collins talk tomorrow (Fri, 11/14) at TTI

Thu Nov 13 08:57:13 CST 2003

TOYOTA TECHNOLOGICAL INSTITUTE TALK 

Speaker: Michael Collins
MIT

Speaker’s homepage: http://www.ai.mit.edu/people/mcollins/

Time: 11:45am
Date:  Friday, November 14th, 2003
Place:  TTI-C (1427 East 60th Street, Second Floor - Press Building) 
*FREE LUNCH FOLLOWING THE TALK*

Title: Large-Margin Methods for Natural Language Learning

Abstract: Sequential data is everywhere: obvious examples being text (the web, 
or digital libraries), speech, and biological sequences. Algorithms which 
recover structure underlying this data are becoming increasingly important. 
This leads to an interesting class of learning problems: how to learn 
functions which map strings to other discrete structures such as trees, 
segmentations, or underlying state sequences? 

In this talk I will present new algorithms for these problems, derived from 
Freund and Schapire's voted perceptron algorithm for classification tasks. 
Properties of the algorithm depend directly on a modified notion of "margin" 
on training examples. I will describe how the algorithm can be used to rerank 
N-best output from an existing probabilistic model, using essentially 
arbitrary features of competing analyses; how the "kernel trick" applied to 
discrete structures can lead to efficient learning with representations 
tracking an exponential number of "sub-fragments" of a tree or tagged 
sequence; and how the algorithm can be used for efficient discriminative 
training of weighted automata. 

A first motivation for the new algorithms concerns *representation*: in 
comparison to hidden markov models, or probabilistic context-free grammars, 
the methods are considerably more flexible in the features that can be used to 
discriminate between competing structures. A second theme is *discriminative 
training*: the parameter estimation methods make relatively weak assumptions 
about the distribution underlying examples. During the talk I will present 
experiments with the methods, showing their utility on a number of natural 
language problems.

If you have questions, or would like to meet the speaker, please contact 
Meridel at: 4-9873 or mtrimble at tti-c.org 

For information on future TTI-C talks or events, please go to the TTI-C Events 
page: http://www.tti-c.org/events.shtml