[Theory] 4/30 Talks at TTIC: Joseph (Yossi) Keshet, Technion
Mary Marre via Theory
theory at mailman.cs.uchicago.edu
Thu Apr 24 12:17:40 CDT 2025
*When:* Wednesday, April 30, 2025 at* 10:00** am CT *
*Where: *Talk will be given *live, in-person* at
TTIC, 6045 S. Kenwood Avenue
5th Floor, *Room 529 *
*Virtually:* *livestream via panopto
<https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=d72690c6-1754-49c1-9107-b2c90104a6ce>*
*Who: * Joseph (Yossi) Keshet, Technion
*Title:* From Raw Waveform to Spectrum: Practical and Theoretical Advances
in Diffusion Models for Speech Generation
*Abstract:* In this talk, I will present two complementary contributions
that push the boundaries of diffusion models for speech generation. I will
start by presenting DiffAR, an autoregressive diffusion model capable of
generating high-fidelity raw speech waveforms end-to-end. By operating
directly in the waveform domain and conditioning on overlapping frames,
DiffAR achieves coherent, expressive, and naturally varied speech
generation. Specifically, it allows the creation of local acoustic
behaviors, like vocal fry, which makes the overall waveform sounds more
natural.
Second, I will introduce a novel spectral analysis framework that
interprets the inference process of diffusion models through a
frequency-domain lens. This perspective enables principled design of noise
schedules that are aligned with the spectral characteristics of the target
data, replacing empirical heuristics with theoretically grounded methods.
These works were conducted in collaboration with Roi Benita and Michael
Elad, and are detailed in the following papers:
https://arxiv.org/abs/2310.01381
https://arxiv.org/abs/2502.00180
*Bio: *Joseph (Yossi) Keshet received his B.Sc. and M.Sc. degrees in
Electrical Engineering from Tel Aviv University in 1994 and 2002,
respectively. He completed his Ph.D. in Computer Science in 2008 at the
School of Computer Engineering, The Hebrew University of Jerusalem. From
2008 to 2009, he was a postdoctoral researcher at EPFL and the IDIAP
Research Institute in Switzerland. He then served as a Research Assistant
Professor at TTIC from 2009 to 2012. Between 2013 and 2022, he was an
Associate Professor in the Department of Computer Science at Bar-Ilan
University. Since 2022, he has been an Associate Professor at the Faculty
of Electrical and Computer Engineering at the Technion. His research
interests include speech recognition, speech synthesis, and speech analysis.
*Host: **Karen Livescu* <klivescu at ttic.edu>
Mary C. Marre
Faculty Administrative Support
*Toyota Technological Institute*
*6045 S. Kenwood Avenue, Rm 517*
*Chicago, IL 60637*
*773-834-1757*
*mmarre at ttic.edu <mmarre at ttic.edu>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20250424/9924dcdc/attachment.html>
More information about the Theory
mailing list