[Colloquium] Re: REMINDER: 10/31 Talks at TTIC: Takafumi Koshinaka, Ken Hanazawa, Takayuki Arakawa (NEC, Tokyo Institute of Technology)

Mary Marre via Colloquium colloquium at mailman.cs.uchicago.edu
Wed Oct 31 10:17:46 CDT 2018


*When:    *  Wednesday, October 31st at 11:00 am

*Where:     *TTIC, 6045 S. Kenwood Avenue, 5th Floor, Room 526



*Who:        *Takafumi Koshinaka, Ken Hanazawa and Takayuki Arakawa (NEC
and Tokyo Institute of Technology)


*Title:*        Attentive Statistics Pooling for Deep Speaker Embedding

*Abstract: *We propose attentive statistics pooling for deep speaker
embedding in text-independent speaker verification. In conventional speaker
embedding, frame-level features are averaged over all the frames of a
single utterance to form an utterance-level feature. Our method utilizes an
attention mechanism to give different weights to different frames and
generates not only weighted means but also weighted standard deviations. In
this way, it can capture long-term variations in speaker characteristics
more effectively. An evaluation on the NIST SRE 2012 and the VoxCeleb data
sets shows that it reduces equal error rates (EERs) from the conventional
method by 7.5% and 8.1%, respectively.
This talk is based on arXiv:1803.10963 with some additional results we have
recently obtained.



*Host: Sadaoki Furui <furui at ttic.edu>*


Mary C. Marre
Administrative Assistant
*Toyota Technological Institute*
*6045 S. Kenwood Avenue*
*Room 517*
*Chicago, IL  60637*
*p:(773) 834-1757*
*f: (773) 357-6970*
*mmarre at ttic.edu <mmarre at ttic.edu>*


On Tue, Oct 30, 2018 at 4:01 PM Mary Marre <mmarre at ttic.edu> wrote:

> *When:    *  Wednesday, October 31st at 11:00 am
>
> *Where:     *TTIC, 6045 S. Kenwood Avenue, 5th Floor, Room 526
>
>
>
> *Who:        *Takafumi Koshinaka, Ken Hanazawa and Takayuki Arakawa (NEC
> and Tokyo Institute of Technology)
>
>
> *Title:*        Attentive Statistics Pooling for Deep Speaker Embedding
>
> *Abstract: *We propose attentive statistics pooling for deep speaker
> embedding in text-independent speaker verification. In conventional speaker
> embedding, frame-level features are averaged over all the frames of a
> single utterance to form an utterance-level feature. Our method utilizes an
> attention mechanism to give different weights to different frames and
> generates not only weighted means but also weighted standard deviations. In
> this way, it can capture long-term variations in speaker characteristics
> more effectively. An evaluation on the NIST SRE 2012 and the VoxCeleb data
> sets shows that it reduces equal error rates (EERs) from the conventional
> method by 7.5% and 8.1%, respectively.
> This talk is based on arXiv:1803.10963 with some additional results we
> have recently obtained.
>
>
>
> *Host: Sadaoki Furui <furui at ttic.edu>*
>
>
>
>
> Mary C. Marre
> Administrative Assistant
> *Toyota Technological Institute*
> *6045 S. Kenwood Avenue*
> *Room 517*
> *Chicago, IL  60637*
> *p:(773) 834-1757*
> *f: (773) 357-6970*
> *mmarre at ttic.edu <mmarre at ttic.edu>*
>
>
> On Thu, Oct 25, 2018 at 10:13 AM Mary Marre <mmarre at ttic.edu> wrote:
>
>> *When:    *  Wednesday, October 31st at 11:00 am
>>
>> *Where:     *TTIC, 6045 S. Kenwood Avenue, 5th Floor, Room 526
>>
>>
>>
>> *Who:        *Takafumi Koshinaka, Ken Hanazawa and Takayuki Arakawa (NEC
>> and Tokyo Institute of Technology)
>>
>>
>> *Title:*        Attentive Statistics Pooling for Deep Speaker Embedding
>>
>> *Abstract: *We propose attentive statistics pooling for deep speaker
>> embedding in text-independent speaker verification. In conventional speaker
>> embedding, frame-level features are averaged over all the frames of a
>> single utterance to form an utterance-level feature. Our method utilizes an
>> attention mechanism to give different weights to different frames and
>> generates not only weighted means but also weighted standard deviations. In
>> this way, it can capture long-term variations in speaker characteristics
>> more effectively. An evaluation on the NIST SRE 2012 and the VoxCeleb data
>> sets shows that it reduces equal error rates (EERs) from the conventional
>> method by 7.5% and 8.1%, respectively.
>> This talk is based on arXiv:1803.10963 with some additional results we
>> have recently obtained.
>>
>>
>>
>> *Host: Sadaoki Furui <furui at ttic.edu>*
>>
>>
>>
>>
>> Mary C. Marre
>> Administrative Assistant
>> *Toyota Technological Institute*
>> *6045 S. Kenwood Avenue*
>> *Room 517*
>> *Chicago, IL  60637*
>> *p:(773) 834-1757*
>> *f: (773) 357-6970*
>> *mmarre at ttic.edu <mmarre at ttic.edu>*
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20181031/a370d857/attachment-0001.html>


More information about the Colloquium mailing list