[Theory] TALK NOW!: 1/16 Talks at TTIC: David Harwath, MIT

Thu Jan 16 11:03:34 CST 2020

*When:*      Thursday, January 16th at 11:00 am

*Where:*     TTIC, 6045 S. Kenwood Avenue, 5th Floor, Room 526

*Who: *        David Harwath, MIT

*Title:     *    Learning Spoken Language Through Vision
Abstract:  Humans learn spoken language and visual perception at an early
age by being immersed in the world around them. Why can't computers do the
same? In this talk, I will describe our work to develop methodologies for
grounding continuous speech signals at the raw waveform level to natural
image scenes. I will present self-supervised models capable of jointly
discovering spoken words and the visual objects to which they refer, all
without conventional annotations in either modality. I will show that these
models can be applied across multiple languages, and that the visual domain
can function as an "interlingua," enabling the discovery of word-level
semantic translations at the waveform level.

Bio:  David Harwath is a research scientist in the Spoken Language Systems
group at the MIT Computer Science and Artificial Intelligence Lab (CSAIL).
His research focuses on multimodal, self-supervised learning algorithms for
speech, audio, vision, and text. Under the supervision of James Glass, his
doctoral thesis introduced models for the joint perception of speech and
vision. This work was awarded the 2018 George M. Sprowls Award for the best
Ph.D. thesis in computer science at MIT. He holds a Ph.D. in computer
science from MIT (2018), a S.M. in computer science from MIT (2013), and a
 B.S. in electrical engineering from UIUC (2010).

Host: Karen Livescu <klivescu at ttic.edu>

Mary C. Marre
Administrative Assistant
*Toyota Technological Institute*
*6045 S. Kenwood Avenue*
*Room 517*
*Chicago, IL  60637*
*p:(773) 834-1757*
*f: (773) 357-6970*
*mmarre at ttic.edu <mmarre at ttic.edu>*

On Thu, Jan 16, 2020 at 9:56 AM Mary Marre <mmarre at ttic.edu> wrote:

> *When:*      Thursday, January 16th at 11:00 am
>
>
>
> *Where:*     TTIC, 6045 S. Kenwood Avenue, 5th Floor, Room 526
>
>
>
> *Who: *        David Harwath, MIT
>
>
> *Title:     *    Learning Spoken Language Through Vision
> Abstract:  Humans learn spoken language and visual perception at an early
> age by being immersed in the world around them. Why can't computers do the
> same? In this talk, I will describe our work to develop methodologies for
> grounding continuous speech signals at the raw waveform level to natural
> image scenes. I will present self-supervised models capable of jointly
> discovering spoken words and the visual objects to which they refer, all
> without conventional annotations in either modality. I will show that these
> models can be applied across multiple languages, and that the visual domain
> can function as an "interlingua," enabling the discovery of word-level
> semantic translations at the waveform level.
>
> Bio:  David Harwath is a research scientist in the Spoken Language Systems
> group at the MIT Computer Science and Artificial Intelligence Lab (CSAIL).
> His research focuses on multimodal, self-supervised learning algorithms for
> speech, audio, vision, and text. Under the supervision of James Glass, his
> doctoral thesis introduced models for the joint perception of speech and
> vision. This work was awarded the 2018 George M. Sprowls Award for the best
> Ph.D. thesis in computer science at MIT. He holds a Ph.D. in computer
> science from MIT (2018), a S.M. in computer science from MIT (2013), and a
>  B.S. in electrical engineering from UIUC (2010).
>
>
> Host: Karen Livescu <klivescu at ttic.edu>
>
>
>
> Mary C. Marre
> Administrative Assistant
> *Toyota Technological Institute*
> *6045 S. Kenwood Avenue*
> *Room 517*
> *Chicago, IL  60637*
> *p:(773) 834-1757*
> *f: (773) 357-6970*
> *mmarre at ttic.edu <mmarre at ttic.edu>*
>
>
> On Wed, Jan 15, 2020 at 2:49 PM Mary Marre <mmarre at ttic.edu> wrote:
>
>> *When:*      Thursday, January 16th at 11:00 am
>>
>>
>>
>> *Where:*     TTIC, 6045 S. Kenwood Avenue, 5th Floor, Room 526
>>
>>
>>
>> *Who: *        David Harwath, MIT
>>
>>
>> *Title:     *    Learning Spoken Language Through Vision
>> Abstract:  Humans learn spoken language and visual perception at an early
>> age by being immersed in the world around them. Why can't computers do the
>> same? In this talk, I will describe our work to develop methodologies for
>> grounding continuous speech signals at the raw waveform level to natural
>> image scenes. I will present self-supervised models capable of jointly
>> discovering spoken words and the visual objects to which they refer, all
>> without conventional annotations in either modality. I will show that these
>> models can be applied across multiple languages, and that the visual domain
>> can function as an "interlingua," enabling the discovery of word-level
>> semantic translations at the waveform level.
>>
>> Bio:  David Harwath is a research scientist in the Spoken Language
>> Systems group at the MIT Computer Science and Artificial Intelligence Lab
>> (CSAIL). His research focuses on multimodal, self-supervised learning
>> algorithms for speech, audio, vision, and text. Under the supervision of
>> James Glass, his doctoral thesis introduced models for the joint perception
>> of speech and vision. This work was awarded the 2018 George M. Sprowls
>> Award for the best Ph.D. thesis in computer science at MIT. He holds a
>> Ph.D. in computer science from MIT (2018), a S.M. in computer science from
>> MIT (2013), and a  B.S. in electrical engineering from UIUC (2010).
>>
>>
>> Host: Karen Livescu <klivescu at ttic.edu>
>>
>>
>> Mary C. Marre
>> Administrative Assistant
>> *Toyota Technological Institute*
>> *6045 S. Kenwood Avenue*
>> *Room 517*
>> *Chicago, IL  60637*
>> *p:(773) 834-1757*
>> *f: (773) 357-6970*
>> *mmarre at ttic.edu <mmarre at ttic.edu>*
>>
>>
>> On Thu, Jan 9, 2020 at 5:01 PM Mary Marre <mmarre at ttic.edu> wrote:
>>
>>> *When:*      Thursday, January 16th at 11:00 am
>>>
>>>
>>>
>>> *Where:*     TTIC, 6045 S. Kenwood Avenue, 5th Floor, Room 526
>>>
>>>
>>>
>>> *Who: *        David Harwath, MIT
>>>
>>>
>>> *Title:     *    Learning Spoken Language Through Vision
>>> Abstract:  Humans learn spoken language and visual perception at an
>>> early age by being immersed in the world around them. Why can't computers
>>> do the same? In this talk, I will describe our work to develop
>>> methodologies for grounding continuous speech signals at the raw waveform
>>> level to natural image scenes. I will present self-supervised models
>>> capable of jointly discovering spoken words and the visual objects to which
>>> they refer, all without conventional annotations in either modality. I will
>>> show that these models can be applied across multiple languages, and that
>>> the visual domain can function as an "interlingua," enabling the discovery
>>> of word-level semantic translations at the waveform level.
>>>
>>> Bio:  David Harwath is a research scientist in the Spoken Language
>>> Systems group at the MIT Computer Science and Artificial Intelligence Lab
>>> (CSAIL). His research focuses on multimodal, self-supervised learning
>>> algorithms for speech, audio, vision, and text. Under the supervision of
>>> James Glass, his doctoral thesis introduced models for the joint perception
>>> of speech and vision. This work was awarded the 2018 George M. Sprowls
>>> Award for the best Ph.D. thesis in computer science at MIT. He holds a
>>> Ph.D. in computer science from MIT (2018), a S.M. in computer science from
>>> MIT (2013), and a  B.S. in electrical engineering from UIUC (2010).
>>>
>>>
>>> Host: Karen Livescu <klivescu at ttic.edu>
>>>
>>>
>>>
>>>
>>> Mary C. Marre
>>> Administrative Assistant
>>> *Toyota Technological Institute*
>>> *6045 S. Kenwood Avenue*
>>> *Room 517*
>>> *Chicago, IL  60637*
>>> *p:(773) 834-1757*
>>> *f: (773) 357-6970*
>>> *mmarre at ttic.edu <mmarre at ttic.edu>*
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20200116/cd180b02/attachment-0001.html>