[Colloquium] Talk by Aris Xanthos, University of Lausanne on February 1, 2010
Katie Casey
caseyk at cs.uchicago.edu
Tue Jan 26 14:21:46 CST 2010
DEPARTMENT OF COMPUTER SCIENCE
UNIVERSITY OF CHICAGO
Date: Monday, February 1, 2010
Time: 2:30 p.m.
Place: Ryerson 251, 1100 E. 58th Street
----------------------------------------------------------
Speaker: Aris Xanthos
From: University of Lausanne
Web page: http://www.unil.ch/imm/page40072.html
Title: Compression-based learning of word separators
Abstract: In this talk, I will describe a novel algorithm for the unsupervised learning of word separators in raw text. The algorithm requires no language-specific knowledge regarding the text being processed. It relies solely on distributional properties of the text and uses the /minimum description length /(/MDL/) principle in order to partition characters into two subsets that correspond well with the traditional notion of letters and separators. The distinction between these types of characters emerges as an optimal solution to the problem of simultaneously compressing two elements: the lexicon that is obtained by tokenizing the text using the hypothesized separators, and the representation of the text under this lexicon. The performance of the algorithm is evaluated on the basis of electronic text in several languages.
Please note that refreshments will be served after the talk at 3:30 in RY 255.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20100126/10362c2c/attachment.htm
More information about the Colloquium
mailing list