ColloquiaTTI Talk by Yann LeCun on 1/22/03

Mon Jan 20 09:57:11 CST 2003

--------------------------------------------------------------- 
TOYOTA TECHNICAL INSTITUTE - TALK 
--------------------------------------------------------------- 

Date: Wednesday, January 22nd, 2003 

Time: 2:30 p.m. 

Place: Ryerson Hall 251 

Speaker: Yann LeCun, NEC Research Institute
http://yann.lecun.com 

Title: Lagrangian Difference Learning with Applications to Vision

Machine learning and statistical modeling are at the core of many 
recent advances in data mining, biological data analysis, information 
retrieval, human-computer interfaces, and time series prediction. 
However, much of the practical successes are applications of the 
simplest of learning paradigms -- supervised learning from labeled 
examples -- often with relatively small models, limited data sets, 
and simple invariances. 

Many of the classical "grand challenges" of AI, such as 3D object 
recognition, continuous speech recognition, and natural language 
understanding have been (with some notable exceptions) largely out of 
reach of machine learning because of the overwhelming dimension of the 
input signal (e.g. pixels of an image) and because of the complex 
invariances in natural signals such as images, video, audio, and 
language. 

Visual object recognition is a particularly interesting problems not 
only because of its potential practical impact, but also because it 
poses the most challenging scientific questions. Cracking it will 
require building very large learning systems composed of multiple 
heterogeneous modules with millions of adjustable parameters, trained 
on millions of examples so as to optimize a global performance 
measure.  Training a complete recognition system from raw pixels to 
object categories requires new ways of integrating heterogeneous 
trainable modules such as object detectors, segmentors, features 
extractors, object recognizers, and models of composite object, so 
that they can be trained cooperatively. It requires trainable modules 
that can manipulate structured data such as graphs and sequences, 
rather than just fixed-size vectors. Finally, it also requires new 
ways to construct objective functions that accurately measure the 
overall performance of the system while being easy to optimize. 

We first propose a methodology to construct objective functions for 
such systems. We assume that the stable states of the system are 
extrema (saddle points) of a Lagrange function, and show that a large 
number of popular supervised and unsupervised learning algorithms can 
be written as the difference between two extrema of this Lagrange 
function (resulting from different sets of constraints). 
Back-propagation, deterministic Boltzmann Machines, discriminative 
training algorithms for Hidden Markov Models, and many other 
algorithms (old and new) can be written in that form. 

We show that the Lagrangian extremization procedure can be applied to 
systems composed of multiple interconnected modules that operate on 
vectors, vector sequences, or valued graphs. Such systems, called 
Graph Transformer Networks, can deal with inputs that are not easily 
handled by traditional learning systems, such as language models, 
probabilistic finite-state machines, and aother combinatorial objects. 

A practical application of GTNs will be briefly described. It combines 
convolutional network character recognizers, stochastic language 
models, and a discriminative Lagrangian Difference criterion to 
recognize bank checks with record accuracy. This system is integrated 
in several commercial recognition engines, and currently reads an 
estimated 10% to 20% of all the checks written in the US. 

Convolutional networks are gradient-based learning systems whose 
multilayer architecture is loosely inspired by biological visual 
systems.  They can be trained to recognize images directly from pixel 
data with a high degree of invariance to translations, geometric 
distortions, and intra-class variability. Applications of 
convolutional nets to face detection, 3D object recognition, and 
automatic TV sport classification will be briefly described. A live 
demo of a convolutional net that can simultaneously segment and 
recognize handwritten digit strings will be shown. 

Part of this work is joint with Leon Bottou, Yoshua Bengio, and 
Patrick Haffner.  On-line demos of convolutional nets are available at 
http://yann.lecun.com/exdb/lenet/index.html Relevant publications are 
at http://yann.lecun.com/exdb/publis/index.html Software is available 
at http://lush.sf.net

*The talk will be followed by refreshments outside Ryerson 251* 

If you would like to meet with the speaker, please send e-mail to Meridel 
Trimble: mtrimble at tti-c.org