[Colloquium] Reminder: Sihem Amer-Yahia's talk today

Margery Ishmael marge at cs.uchicago.edu
Thu Nov 20 10:19:18 CST 2003


------------------------------------------------------------------------ 
---

DEPARTMENT OF COMPUTER SCIENCE - TALK

------------------------------------------------------------------------ 
---

Date: Thursday, November 20, 2003
Time: 3:00 p.m.
Place: Ryerson 251

Speaker: SIHEM AMER-YAHIA from AT&T Labs-Research

Title: Querying Text in XML: TeXQuery and PIX

Abstract:

The Full-Text Task Force (FTTF) within the W3C has been working on
designing a query language that extends XQuery, the W3C recommended
standard for querying XML documents, with the ability to query text in
XML documents. Examples of such queries can be found at:
http://www.w3.org/TR/xmlquery-full-text-use-cases/ TeXQuery is a
language proposal that we made to the FTTF. It provides a rich set of
fully composable full-text search primitives, such as Boolean
connectives, phrase matching, proximity distance and stemming. I will
give an overview of the language and some of the challenges that arise
from designing such a language. I will then focus on one technical
challenge: phrase matching in XML.

Phrase matching presents new challenges in XML as documents contain
text that may interleave with arbitrary markup, thwarting search
techniques that require strict contiguity or close proximity of
words. We develop PIX, an algorithm and system for phrase matching in
XML that permit to selectively ignore markup when matching
phrases. PIX supports exact and approximate phrase matching.  Our
algorithm is optimal among algorithms that scan the complete inverted
lists of query tokens.

TeXQuery is a work in collaboration with Chavdar Botev (Cornell
University) and Jayavel Shanmugasundaram (Cornell University).
http://www.cs.cornell.edu/database/TeXQuery

PIX is a work in collaboration with Divesh Srivastava (AT&T
Labs), Mary Fernandez (AT&T Labs) and, Yu Xu (UCSD).
http://www.research.att.com/~sihem/PIX/index.html

Speaker's URL: http://www.research.att.com/~sihem/


HOST: Svetlozar Nestorov

*Refreshments will follow the talk in Ryerson 255*

People in need of assistance should call 773-834-8977 in advance.





More information about the Colloquium mailing list