[Colloquium] TTIC Talks: Liang Huang, ISI

Wed Feb 8 13:47:39 CST 2012

When:     Friday, February 17 @ 11 a.m.

Where:   TTIC Conference Room #526, 6045 S. Kenwood Avenue, 5th Floor

Who:      Liang Huang, ISI

Title:       Efficient Search and Learning for Language Understanding and
Translation

What is in common between translating from English into Chinese and
compiling C++ into machine code? And yet what are the differences that make
the former so much harder for computers? How can computers learn from human
translators?

This talk sketches an efficient (linear-time) "understanding + rewriting"
paradigm for machine translation inspired by both human translators as well
as compilers. In this paradigm, a source language sentence is first parsed
into a syntactic tree, which is then recursively converted into a target
language sentence via tree-to-string rewriting rules. In both
"understanding" and "rewriting" stages, this paradigm closely resembles the
efficiency and incrementality of both human processing and compiling. We
will discuss these two stages in turn.

First, for the "understanding" part, we present a linear-time approximate
dynamic programming algorithm for incremental parsing that is as accurate
as those much slower (cubic-time) chart parsers, while being as fast as
those fast but lossy greedy parsers, thus getting the advantages of both
worlds for the first time, achieving state-of-the-art speed and accuracy.
But how do we efficiently learn such a parsing model with approximate
inference from huge amounts of data? We propose a general framework for
structured prediction based on the structured perceptron that is guaranteed
to succeed with inexact search and works well in practice.

Next, the "rewriting" stage translates these source-language parse trees
into the target language. But parsing errors from the previous stage
adversely affect translation quality. An obvious solution is to use the
top-k parses, rather than the 1-best tree, but this only helps a little bit
due to the limited scope of the k-best list. We instead propose a
"forest-based approach", which translates a packed forest encoding
*exponentially* many parses in a polynomial space by sharing common
subtrees. Large-scale experiments showed very significant improvements in
terms of translation quality, which outperforms the leading systems in
literature. Like the "understanding" part, the translation algorithm here
is also linear-time and incremental, thus resembles human translation.

Host: Karen Livescu, klivescu at ttic.edu

-- 
Liv Leader
Human Resources Coordinator

Toyota Technological Institute Chicago
6045 S Kenwood Ave
Chicago, IL 60637
Phone- (773) 702-5033
Fax-     (773) 834-9881
Email-  lleader at ttic.edu <jam at ttic.edu>
Web-   www.ttic.edu
<http://www.ttic.edu/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20120208/5a02f140/attachment.htm