[Colloquium] TTIC Talk: Michel Galley, Stanford

Wed Jan 20 08:36:45 CST 2010

When:             *Friday, Jan 22 @ 11:00am*

Where:           * TTIC Conference Room #526*, 6045 S Kenwood Ave

Who:               *Michel Galley*, Stanford

Title:          *      **Machine Translation: Re-envisioning the Model Space
*

 As the Internet is becoming linguistically very heterogeneous, information
access becomes more challenging, since retrieved documents need to be
translated into a wide range of languages. Fortunately, machine translation
(MT) has made significant progress in recent years thanks to a shift towards
corpus-based and statistical methods, which address the challenge of
building MT systems for many language pairs. However, translations produced
by statistical MT systems remain poor and unintelligible for the more
structurally different language pairs, e.g., Chinese to English. The root of
the problem is that most state-of-the-art techniques parameterize the
translation process using words or word sequences, which do not capture the
kind of global features required to operate long distance (e.g., sentential)
transformations needed to produce grammatical output. To address this
shortcoming, I will describe a technique for learning syntactic translation
models that map grammatical structures (e.g., verb and noun phrases) from
one language to another. The key benefit of this technique is that it copes
with major structural divergences between languages by learning
non-isomorphic tree mappings, which is done in linear time and thus easily
scales to hundreds of millions of words. But despite being able to model
global syntactic relationships, syntactic MT systems are prohibitively slow
at test time due to cubic-time tree-based decoding algorithms, and thus
often fail to deliver on-the-fly translations. To address this issue, I will
also present simpler translation models that retain the main advantages of
syntactic systems---i.e., parsimony and good generalization on unseen
data---while requiring no syntactic annotation. These simpler models can be
decoded in linear time without resorting to any tree-based decoding
algorithm, while delivering a competitive level of translation quality. The
latter work forms the basis of the Stanford MT system, which ranked second
in the 2009 NIST MT evaluation in Arabic-to-English.

Host:              Karen Livescu, klivescu at ttic.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20100120/46844ca1/attachment.htm