[Colloquium] Reminder: Reddy/Dissertation Defense/Jul 18, 2012

Margaret Jaffey margaret at cs.uchicago.edu
Fri Jul 13 09:30:28 CDT 2012


This is a reminder about Sravana's defense that will be held on
Wednesday.

       Department of Computer Science/The University of Chicago

                     *** Dissertation Defense ***


Candidate:  Sravana Reddy

Date:  Wednesday, July 18, 2012

Time:  10:00 AM

Place:  Ryerson 276

Title: Learning Pronunciations from Unlabeled Evidence

Abstract:
The pronunciation of a word represented in an alphabetic writing
system (such as this one) is relatively transparent -- but a
language's sounds change over time and vary across space, while its
spellings tend to remain static, resulting in some amount of
divergence between the written and spoken forms. The introduction of
loanwords and proper names from other languages with different
phonologies and scripts further complicates the relationship between
orthography and pronunciation.

However, there are sources of information about the sound of a word
besides spelling. Speech is the most natural example: a word's
pronunciation is greatly clarified upon hearing it in a spoken
utterance. In the case of proper names, knowing the linguistic or
ethnic origin of the name is often instrumental in determining how it
should be pronounced. Rhymes in poems or songs also provide a cue to
pronunciation, which is particularly relevant for inferring the sound
of a word at earlier point in history.

Predicting the pronunciations of words given their written
representations is a problem encountered regularly by human speakers,
who will often use secondary contextual information when the spelling
by itself proves to be ambiguous. Building a system to generalize from
a pronunciation dictionary and hypothesize the pronunciations of new
words, often referred to as grapheme-to-phoneme modeling, is also an
important engineering problem in speech technology, as well as in the
machine translation of named entities.

Using extra-orthographic evidence in a computational model for
grapheme-to-phoneme conversion necessitates having the data that
provides this information in sufficiently large quantities. Collecting
annotated data -- speech labeled with the words that it contains,
names with their ethnic origins, or poetry with rhyming patterns --
can be extremely difficult and expensive. On the other hand, unlabeled
data -- speech recordings, lists of names, archives of poetry -- is
available in plenty. This thesis presents methods for learning
pronunciations of words using these forms of unlabeled evidence.

Sravana's advisor is Prof. John Goldsmith

Login to the Computer Science Department website for details,
including a draft copy of the dissertation:

 https://www.cs.uchicago.edu/phd/phd_announcements#sravana

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Margaret P. Jaffey            margaret at cs.uchicago.edu
Department of Computer Science
Student Support Rep (Ry 156)               (773) 702-6011
The University of Chicago      http://www.cs.uchicago.edu
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


More information about the Colloquium mailing list