[Colloquium] Talk by Aris Xanthos, University of Lausanne on February 1, 2010

Katie Casey caseyk at cs.uchicago.edu
Tue Jan 26 14:21:46 CST 2010


DEPARTMENT OF COMPUTER SCIENCE

UNIVERSITY OF CHICAGO

Date: Monday, February 1, 2010
Time: 2:30 p.m.
Place: Ryerson 251, 1100 E. 58th Street

----------------------------------------------------------

Speaker:	Aris Xanthos

From:		University of Lausanne

Web page:	http://www.unil.ch/imm/page40072.html

Title: Compression-based learning of word separators

Abstract: In this talk, I will describe a novel algorithm for the unsupervised learning of word separators in raw text. The algorithm requires no language-specific knowledge regarding the text being processed. It relies solely on distributional properties of the text and uses the /minimum description length /(/MDL/) principle in order to partition characters into two subsets that correspond well with the traditional notion of letters and separators. The distinction between these types of characters emerges as an optimal solution to the problem of simultaneously compressing two elements: the lexicon that is obtained by tokenizing the text using the hypothesized separators, and the representation of the text under this lexicon. The performance of the algorithm is evaluated on the basis of electronic text in several languages.


Please note that refreshments will be served after the talk at 3:30 in RY 255.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20100126/10362c2c/attachment.htm 


More information about the Colloquium mailing list