ColloquiaTTI Talk by Yann LeCun on 1/22/03
Meridel Trimble
mtrimble at tti-c.org
Mon Jan 20 09:57:11 CST 2003
---------------------------------------------------------------
TOYOTA TECHNICAL INSTITUTE - TALK
---------------------------------------------------------------
Date: Wednesday, January 22nd, 2003
Time: 2:30 p.m.
Place: Ryerson Hall 251
Speaker: Yann LeCun, NEC Research Institute
http://yann.lecun.com
Title: Lagrangian Difference Learning with Applications to Vision
Machine learning and statistical modeling are at the core of many
recent advances in data mining, biological data analysis, information
retrieval, human-computer interfaces, and time series prediction.
However, much of the practical successes are applications of the
simplest of learning paradigms -- supervised learning from labeled
examples -- often with relatively small models, limited data sets,
and simple invariances.
Many of the classical "grand challenges" of AI, such as 3D object
recognition, continuous speech recognition, and natural language
understanding have been (with some notable exceptions) largely out of
reach of machine learning because of the overwhelming dimension of the
input signal (e.g. pixels of an image) and because of the complex
invariances in natural signals such as images, video, audio, and
language.
Visual object recognition is a particularly interesting problems not
only because of its potential practical impact, but also because it
poses the most challenging scientific questions. Cracking it will
require building very large learning systems composed of multiple
heterogeneous modules with millions of adjustable parameters, trained
on millions of examples so as to optimize a global performance
measure. Training a complete recognition system from raw pixels to
object categories requires new ways of integrating heterogeneous
trainable modules such as object detectors, segmentors, features
extractors, object recognizers, and models of composite object, so
that they can be trained cooperatively. It requires trainable modules
that can manipulate structured data such as graphs and sequences,
rather than just fixed-size vectors. Finally, it also requires new
ways to construct objective functions that accurately measure the
overall performance of the system while being easy to optimize.
We first propose a methodology to construct objective functions for
such systems. We assume that the stable states of the system are
extrema (saddle points) of a Lagrange function, and show that a large
number of popular supervised and unsupervised learning algorithms can
be written as the difference between two extrema of this Lagrange
function (resulting from different sets of constraints).
Back-propagation, deterministic Boltzmann Machines, discriminative
training algorithms for Hidden Markov Models, and many other
algorithms (old and new) can be written in that form.
We show that the Lagrangian extremization procedure can be applied to
systems composed of multiple interconnected modules that operate on
vectors, vector sequences, or valued graphs. Such systems, called
Graph Transformer Networks, can deal with inputs that are not easily
handled by traditional learning systems, such as language models,
probabilistic finite-state machines, and aother combinatorial objects.
A practical application of GTNs will be briefly described. It combines
convolutional network character recognizers, stochastic language
models, and a discriminative Lagrangian Difference criterion to
recognize bank checks with record accuracy. This system is integrated
in several commercial recognition engines, and currently reads an
estimated 10% to 20% of all the checks written in the US.
Convolutional networks are gradient-based learning systems whose
multilayer architecture is loosely inspired by biological visual
systems. They can be trained to recognize images directly from pixel
data with a high degree of invariance to translations, geometric
distortions, and intra-class variability. Applications of
convolutional nets to face detection, 3D object recognition, and
automatic TV sport classification will be briefly described. A live
demo of a convolutional net that can simultaneously segment and
recognize handwritten digit strings will be shown.
Part of this work is joint with Leon Bottou, Yoshua Bengio, and
Patrick Haffner. On-line demos of convolutional nets are available at
http://yann.lecun.com/exdb/lenet/index.html Relevant publications are
at http://yann.lecun.com/exdb/publis/index.html Software is available
at http://lush.sf.net
*The talk will be followed by refreshments outside Ryerson 251*
If you would like to meet with the speaker, please send e-mail to Meridel
Trimble: mtrimble at tti-c.org
More information about the Colloquium
mailing list