[Colloquium] TTIC Colloquium: Torbjørn Svendsen, Norwegian University of Science and Technology

Liv Leader lleader at ttic.edu
Wed Dec 7 09:07:15 CST 2011


REMINDER:

When:     Thursday, December 8 @ 11

Where:   TTIC Conference Room #526, 6045 S. Kenwood Avenue, 5th Floor

Who:      Torbjørn Svendsen, Norwegian University of Science and Technology

Title:       Universal Speech Attribute Characterization for Automatic
Speech
Recognition and Spoken Language Recognition

Abstract:

The fundamental unit for both automatic speech recognition and for
phonotactic (spoken) language recognition is the phoneme. Provided
that sufficient training material exists, accurate acoustic models can
be trained and applied. However, the phonemes are language dependent.
In some instances, there is a lack of training material for a specific
language. Also in other cases it can be of interest to obtain a
language universal set of units. A possible approach is to focus on
the sound production characteristics, such as manner and place of
articulation. Provided that reliable detectors/recognizers for this
type of speech attributes can be designed, they will conceptually be
less language dependent than phones.

We have investigated the application of language universal speech
attributes based on a manner and place of articulation description for
the problems of language recognition and for speech recognition. The
approaches and experimental results will be reviewed in this
presentation.

For language recognition (LRE), speech attributes, such as manner and
place of articulation, are chosen to form a universal unit inventory
and used to build a set of language-universal attribute models with
data-driven modeling techniques. The vector space modeling approach to
LRE is adopted, where a spoken utterance is first decoded into a
sequence of attributes independently of its language. Then, a feature
vector is generated by using co-occurrence statistics of manner or
place units, and the final LRE decision is implemented with a vector
space language classifier. Experimental evidence demonstrates the
feasibility of the proposed techniques, and also shows that the
proposed technique outperforms the standard, state-of-the-art parallel
phoneme recognizers followed by language modeling approach under the
same experimental conditions.

For automatic speech recognition (ASR) our goal has been to design
good ASR systems with little or no language-specific speech data for
resource-limited languages. This work aims at demonstrating that a
recently proposed automatic speech attribute transcription framework
can play a key role in designing language-universal acoustic models by
sharing speech units among all target languages at the acoustic
phonetic attribute level. The language-universal acoustic models are
evaluated through phone recognition. Good cross-language attribute
detection and continuous phone recognition performance can be
accomplished for “unseen” languages using minimal training data from
the target languages to be recognized.

Host: Karen Livescu, klivescu at ttic.edu

-- 
Liv Leader
Human Resources Coordinator

Toyota Technological Institute Chicago
6045 S Kenwood Ave
Chicago, IL 60637
Phone- (773) 702-5033
Fax-     (773) 834-9881
Email-  lleader at ttic.edu <jam at ttic.edu>
Web-   www.ttic.edu
<http://www.ttic.edu/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20111207/728a0fee/attachment.htm 


More information about the Colloquium mailing list