[Colloquium] Reminder: Waxmonsky/Dissertation Defense/Jul 20, 2011

Margaret Jaffey margaret at cs.uchicago.edu
Tue Jul 19 10:17:56 CDT 2011


This is a reminder about Sonjia's defense tomorrow.

       Department of Computer Science/The University of Chicago

                     *** Dissertation Defense ***


Candidate:  Sonjia Waxmonsky

Date:  Wednesday, July 20, 2011

Time:  10:00 AM

Place:  Ryerson 277

Title: Natural Language Processing for Named Entities with
Word-internal Information

Abstract:
In this thesis we present three projects that focus on natural
language processing of named entities in text data, specifically for
scenarios and domains where rare and out-of-vocabulary (OOV) words are
problematic. First, we present work on the discovery of the language
of origin of a named entity. This has applications in speech and
language processing tasks, such as text-to-speech synthesis, since
language of origin can help predict pronunciation of OOV words. Word
origin recognition has also been studied for demographics and life
sciences as a component in the collection of ethnicity data. Previous
research has applied supervised machine learning methods to automate
this task, but this requires a set of hand-labeled training data for
each language represented in the model. Hand-labeled data may be
expensive to acquire, and additionally the set of origin languages may
not be known a priori. We consider how active learning can be applied
to minimize the amount of manual annotation needed to train a
successful supervised model. We also apply word origin modeling to
grapheme-to-phoneme (G2P) conversion of US surnames, using both
supervised and unsupervised approaches.

Finally, we present work in biomedical text mining, where we examine
named entity tagging of disease mentions in biomedical text. We
extract morphology information from disease terms to be included as
features in a Conditional Random Field model. We also show how
biomedical disease terms can be decompounded into their component stem
parts. Morphology information is acquired with the Linguistica toolkit
for unsupervised learning of morphology, which has the advantage that
a hand-segmented training set is not required for feature extraction.

Sonjia's advisor is Prof. John Goldsmith

Login to the Computer Science Department website for details,
including a draft copy of the dissertation:

 https://www.cs.uchicago.edu/phd/phd_announcements#wax

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Margaret P. Jaffey            margaret at cs.uchicago.edu
Department of Computer Science
Student Support Rep (Ry 156)               (773) 702-6011
The University of Chicago      http://www.cs.uchicago.edu
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


More information about the Colloquium mailing list