ColloquiaReminder: today's talk by Dragomir Radev

Mon Apr 16 09:25:51 CDT 2001

Today, April 16 at 2:30 p.m. in Ryerson 251

Dragomir R. Radev, University of Michigan
Title: Domain-independent Natural Language Question Answering

Abstract: Traditional information retrieval systems (including modern Web-based
search engines) operate as follows: a user types in a query and the IR
system returns a set of documents ordered by their expected relevance
to the user query and, by extension, to the user's information
need. This framework suffers from two problems: first, users are
expected to follow a specific engine-specific syntax to formulate
their information need in the form of a query and second, each
returned document may contain only a small snippet of text that is
relevant to the user query. We address these two problems in the
context of domain-independent, natural language question answering. In
our scenario, a user types in factual natural language questions such
as "When did the Neanderthal man live?" or "Which Frenchman declined
the Nobel Prize for Literature for ideological reasons?" and gets back
the precise answer to these questions rather than a set of documents
that simply contain the same keywords as the questions.

I will describe two approaches for natural language question
answering. The first algorithm, AnSel, uses logistic regression to
determine the set of likeliest answers among a larger set produced by
an IR system based on predictive annotation (Prager et al. 00, Radev
et al. 00). AnSel achieves excellent performance on a standardized
document collection, however it requires a large amount of annotated
data to operate and is not directly scalable to the World-Wide Web.

In the second approach, question answering is viewed as an instance of
the noisy channel problem. We assume that there exists a single best
query Q that achieves high precision and recall given a particular
information need, a particular search engine, and a particular
document collection. The query Q is then transformed into a
grammatical natural language question N through a noisy channel. Our
goal is, given N, to recover the original query Q among the space U_Q
of all possible queries that can be generated from N using a limited
sequence of linguistically-justified transformation operators. Our
algorithm, QALM, is based on expectation maximization and learns
which paraphrase Q\hat among U_Q achieves the highest score on a
standardized benchmark. That paraphrase is then assumed to be the
closest approximation to the original query Q. The algorithm makes use
of a moderate number of labeled question-answer pairs for
bootstrapping. Its generatilization ability is based on the use of
several classes of linguistic (lexico-semantic and collocational)
knowledge.
http://www.si.umich.edu/~radev

*The talk will be followed by refreshments in Ryerson 255*
-- 
Margery Ishmael
Department of Computer Science
The University of Chicago
1100 E. 58th Street
Chicago, IL. 60637

Tel. 773-834-8977  Fax. 773-702-8487