[Theory] TOMORROW: 4/30 Talks at TTIC: Joseph (Yossi) Keshet, Technion

Tue Apr 29 14:08:40 CDT 2025

*When:*        Wednesday, April 30, 2025 at* 10:00** am CT *

*Where:       *Talk will be given *live, in-person* at

                   TTIC, 6045 S. Kenwood Avenue

                   5th Floor, *Room 529 *

*Virtually:*   *livestream via panopto
<https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=d72690c6-1754-49c1-9107-b2c90104a6ce>*

*Who: *        Joseph (Yossi) Keshet, Technion

*Title:* From Raw Waveform to Spectrum: Practical and Theoretical Advances
in Diffusion Models for Speech Generation

*Abstract:* In this talk, I will present two complementary contributions
that push the boundaries of diffusion models for speech generation. I will
start by presenting DiffAR, an autoregressive diffusion model capable of
generating high-fidelity raw speech waveforms end-to-end. By operating
directly in the waveform domain and conditioning on overlapping frames,
DiffAR achieves coherent, expressive, and naturally varied speech
generation. Specifically, it allows the creation of local acoustic
behaviors, like vocal fry, which makes the overall waveform sounds more
natural.

Second, I will introduce a novel spectral analysis framework that
interprets the inference process of diffusion models through a
frequency-domain lens. This perspective enables principled design of noise
schedules that are aligned with the spectral characteristics of the target
data, replacing empirical heuristics with theoretically grounded methods.

These works were conducted in collaboration with Roi Benita and Michael
Elad, and are detailed in the following papers:

https://arxiv.org/abs/2310.01381

https://arxiv.org/abs/2502.00180

*Bio: *Joseph (Yossi) Keshet received his B.Sc. and M.Sc. degrees in
Electrical Engineering from Tel Aviv University in 1994 and 2002,
respectively. He completed his Ph.D. in Computer Science in 2008 at the
School of Computer Engineering, The Hebrew University of Jerusalem. From
2008 to 2009, he was a postdoctoral researcher at EPFL and the IDIAP
Research Institute in Switzerland. He then served as a Research Assistant
Professor at TTIC from 2009 to 2012. Between 2013 and 2022, he was an
Associate Professor in the Department of Computer Science at Bar-Ilan
University. Since 2022, he has been an Associate Professor at the Faculty
of Electrical and Computer Engineering at the Technion. His research
interests include speech recognition, speech synthesis, and speech analysis.
*Host: **Karen Livescu* <klivescu at ttic.edu>

Mary C. Marre
Faculty Administrative Support
*Toyota Technological Institute*
*6045 S. Kenwood Avenue, Rm 517*
*Chicago, IL  60637*
*773-834-1757*
*mmarre at ttic.edu <mmarre at ttic.edu>*

On Thu, Apr 24, 2025 at 12:17 PM Mary Marre <mmarre at ttic.edu> wrote:

> *When:*        Wednesday, April 30, 2025 at* 10:00** am CT *
>
>
> *Where:       *Talk will be given *live, in-person* at
>
>                    TTIC, 6045 S. Kenwood Avenue
>
>                    5th Floor, *Room 529 *
>
>
> *Virtually:*   *livestream via panopto
> <https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=d72690c6-1754-49c1-9107-b2c90104a6ce>*
>
>
>
> *Who: *        Joseph (Yossi) Keshet, Technion
>
>
>
> *Title:* From Raw Waveform to Spectrum: Practical and Theoretical
> Advances in Diffusion Models for Speech Generation
>
> *Abstract:* In this talk, I will present two complementary contributions
> that push the boundaries of diffusion models for speech generation. I will
> start by presenting DiffAR, an autoregressive diffusion model capable of
> generating high-fidelity raw speech waveforms end-to-end. By operating
> directly in the waveform domain and conditioning on overlapping frames,
> DiffAR achieves coherent, expressive, and naturally varied speech
> generation. Specifically, it allows the creation of local acoustic
> behaviors, like vocal fry, which makes the overall waveform sounds more
> natural.
>
> Second, I will introduce a novel spectral analysis framework that
> interprets the inference process of diffusion models through a
> frequency-domain lens. This perspective enables principled design of noise
> schedules that are aligned with the spectral characteristics of the target
> data, replacing empirical heuristics with theoretically grounded methods.
>
> These works were conducted in collaboration with Roi Benita and Michael
> Elad, and are detailed in the following papers:
>
> https://arxiv.org/abs/2310.01381
>
> https://arxiv.org/abs/2502.00180
>
> *Bio: *Joseph (Yossi) Keshet received his B.Sc. and M.Sc. degrees in
> Electrical Engineering from Tel Aviv University in 1994 and 2002,
> respectively. He completed his Ph.D. in Computer Science in 2008 at the
> School of Computer Engineering, The Hebrew University of Jerusalem. From
> 2008 to 2009, he was a postdoctoral researcher at EPFL and the IDIAP
> Research Institute in Switzerland. He then served as a Research Assistant
> Professor at TTIC from 2009 to 2012. Between 2013 and 2022, he was an
> Associate Professor in the Department of Computer Science at Bar-Ilan
> University. Since 2022, he has been an Associate Professor at the Faculty
> of Electrical and Computer Engineering at the Technion. His research
> interests include speech recognition, speech synthesis, and speech analysis.
> *Host: **Karen Livescu* <klivescu at ttic.edu>
>
>
>
> Mary C. Marre
> Faculty Administrative Support
> *Toyota Technological Institute*
> *6045 S. Kenwood Avenue, Rm 517*
> *Chicago, IL  60637*
> *773-834-1757*
> *mmarre at ttic.edu <mmarre at ttic.edu>*
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20250429/2762a054/attachment-0001.html>