<div dir="ltr"><div dir="ltr"><div><div class="gmail_default" style="font-size:small"><font style="font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b> </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit"><font style="font-family:arial,sans-serif;color:rgb(0,0,0)"> Wednes</font><span class="gmail_default" style="font-family:arial,sans-serif;color:rgb(0,0,0)">day, April 30,<span class="gmail_default"> </span>2025</span><font style="font-family:arial,sans-serif;color:rgb(0,0,0)"> at</font><b><font color="#000000" style="font-family:arial,sans-serif"> <u style="background-color:rgb(255,255,0)">10:00</u></font></b><font color="#000000"><u><b><font face="arial, sans-serif"><span style="background-color:rgb(255,255,0)"> am</span></font></b><b style="background-color:rgb(255,255,0)"><font face="arial, sans-serif"> CT </font></b></u></font></font></font></div><div><div><div class="gmail_default"><div class="gmail_default"><div class="gmail_default"><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b><font color="#500050" face="arial, sans-serif"><br></font></b></p><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><font face="arial, sans-serif"><b><font color="#500050">Where: </font></b><font color="#000000">Talk will be given </font><font color="#000000" style="font-weight:bold"><u>live, in-person</u></font><font style="font-weight:bold"> </font>at<br></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font color="#500050"> </font><font color="#000000"> TTIC, 6045 S. Kenwood Avenue</font></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000"> 5th Floor, <b>Room 530</b></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><b><span style="color:black"><br></span></b></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><b style="font-family:arial,sans-serif;color:rgb(60,64,67);letter-spacing:0.2px">Virtually:</b><span style="font-family:arial,sans-serif;color:rgb(60,64,67);letter-spacing:0.2px"> </span><span style="font-family:arial,sans-serif;letter-spacing:0.2px"><font color="#3c4043"> </font></span><i style="font-family:arial,sans-serif;letter-spacing:0.2px"><font color="#000000">livestream via </font><b style="color:rgb(0,0,255)"><a href="https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=d72690c6-1754-49c1-9107-b2c90104a6ce" target="_blank">panopto</a></b></i></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><span style="letter-spacing:0.2px"><b style="font-style:italic;font-family:arial,sans-serif;color:rgb(0,0,255)"> </b></span><b style="letter-spacing:0.2px"><font size="1" face="tahoma, sans-serif"> </font></b><b style="color:rgb(60,64,67);letter-spacing:0.2px"><font face="arial, sans-serif"> </font></b></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font style="color:rgb(80,0,80);vertical-align:inherit"><font style="vertical-align:inherit"><b>Who: </b> <font color="#500050"> </font><font color="#000000"><font color="#500050"> </font></font></font></font></font>Joseph (Yossi) Keshet, Technion</p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><br></p><div style="border-top:none;border-right:none;border-left:none;border-bottom:2.25pt solid rgb(11,118,159);padding:0in 0in 1pt"></div><div><br></div></div></div><div class="gmail_default"><div dir="ltr"><b>Title:</b> From Raw Waveform to Spectrum: Practical and Theoretical Advances in Diffusion Models for Speech Generation<br><b><br></b></div><div dir="ltr"><b>Abstract:</b> In this talk, I will present two complementary contributions that push the boundaries of diffusion models for speech generation. I will start by presenting DiffAR, an autoregressive diffusion model capable of generating high-fidelity raw speech waveforms end-to-end. By operating directly in the waveform domain and conditioning on overlapping frames, DiffAR achieves coherent, expressive, and naturally varied speech generation. Specifically, it allows the creation of local acoustic behaviors, like vocal fry, which makes the overall waveform sounds more natural.<br><br>Second, I will introduce a novel spectral analysis framework that interprets the inference process of diffusion models through a frequency-domain lens. This perspective enables principled design of noise schedules that are aligned with the spectral characteristics of the target data, replacing empirical heuristics with theoretically grounded methods.<br><br>These works were conducted in collaboration with Roi Benita and Michael Elad, and are detailed in the following papers:<p style="margin:0px;line-height:normal;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><a href="https://arxiv.org/abs/2310.01381" target="_blank"><font face="arial, sans-serif">https://arxiv.org/abs/2310.01381</font></a></p><p style="margin:0px;line-height:normal;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><a href="https://arxiv.org/abs/2502.00180" target="_blank"><font face="arial, sans-serif">https://arxiv.org/abs/2502.00180</font></a></p><p style="color:rgb(31,31,31)"><font face="arial, sans-serif"><b>Bio: </b>Joseph (Yossi) Keshet received his B.Sc. and M.Sc. degrees in Electrical Engineering from Tel Aviv University in 1994 and 2002, respectively. He completed his Ph.D. in Computer Science in 2008 at the School of Computer Engineering, The Hebrew University of Jerusalem. From 2008 to 2009, he was a postdoctoral researcher at EPFL and the IDIAP Research Institute in Switzerland. He then served as a Research Assistant Professor at TTIC from 2009 to 2012. Between 2013 and 2022, he was an Associate Professor in the Department of Computer Science at Bar-Ilan University. Since 2022, he has been an Associate Professor at the Faculty of Electrical and Computer Engineering at the Technion. His research interests include speech recognition, speech synthesis, and speech analysis.</font></p></div></div></div></div><div><div class="gmail_default"><font face="arial, sans-serif"><b>Host: </b><a href="mailto:klivescu@ttic.edu" rel="noreferrer" target="_blank"><b>Karen Livescu</b></a></font></div></div><div class="gmail_default"><br></div><br clear="all"></div><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small"><span class="gmail_default" style="font-size:small"></span>Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue, Rm 517</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL 60637</font></i><br></font></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">773-834-1757</font></i></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div><br><br clear="all"></div><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue, Rm 517</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL 60637</font></i><br></font></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">773-834-1757</font></i></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div><br></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Wed, Apr 30, 2025 at 10:02 AM Mary Marre <<a href="mailto:mmarre@ttic.edu">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div><div style="font-size:small"><font style="font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b> </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit"><font style="font-family:arial,sans-serif;color:rgb(0,0,0)"> Wednes</font><span class="gmail_default" style="font-family:arial,sans-serif;color:rgb(0,0,0)">day, April 30,<span class="gmail_default"> </span>2025</span><font style="font-family:arial,sans-serif;color:rgb(0,0,0)"> at</font><b><font color="#000000" style="font-family:arial,sans-serif"> <u style="background-color:rgb(255,255,0)">10:00</u></font></b><font color="#000000"><u><b><font face="arial, sans-serif"><span style="background-color:rgb(255,255,0)"> am</span></font></b><b style="background-color:rgb(255,255,0)"><font face="arial, sans-serif"> CT </font></b></u></font></font></font></div><div><div><div><div><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b><font color="#500050" face="arial, sans-serif"><br></font></b></p><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><font face="arial, sans-serif"><b><font color="#500050">Where: </font></b><font color="#000000">Talk will be given </font><font color="#000000" style="font-weight:bold"><u>live, in-person</u></font><font style="font-weight:bold"> </font>at<br></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font color="#500050"> </font><font color="#000000"> TTIC, 6045 S. Kenwood Avenue</font></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000"> 5th Floor, <b>Room 539</b></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><b><span style="color:black"><br></span></b></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><b style="font-family:arial,sans-serif;color:rgb(60,64,67);letter-spacing:0.2px">Virtually:</b><span style="font-family:arial,sans-serif;color:rgb(60,64,67);letter-spacing:0.2px"> </span><span style="font-family:arial,sans-serif;letter-spacing:0.2px"><font color="#3c4043"> </font></span><i style="font-family:arial,sans-serif;letter-spacing:0.2px"><font color="#000000">livestream via </font><b style="color:rgb(0,0,255)"><a href="https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=d72690c6-1754-49c1-9107-b2c90104a6ce" target="_blank">panopto</a></b></i></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><span style="letter-spacing:0.2px"><b style="font-style:italic;font-family:arial,sans-serif;color:rgb(0,0,255)"> </b></span><b style="letter-spacing:0.2px"><font size="1" face="tahoma, sans-serif"> </font></b><b style="color:rgb(60,64,67);letter-spacing:0.2px"><font face="arial, sans-serif"> </font></b></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font style="color:rgb(80,0,80);vertical-align:inherit"><font style="vertical-align:inherit"><b>Who: </b> <font color="#500050"> </font><font color="#000000"><font color="#500050"> </font></font></font></font></font>Joseph (Yossi) Keshet, Technion</p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><br></p><div style="border-top:none;border-right:none;border-left:none;border-bottom:2.25pt solid rgb(11,118,159);padding:0in 0in 1pt"></div><div><br></div></div></div><div><div dir="ltr"><b>Title:</b> From Raw Waveform to Spectrum: Practical and Theoretical Advances in Diffusion Models for Speech Generation<br><b><br></b></div><div dir="ltr"><b>Abstract:</b> In this talk, I will present two complementary contributions that push the boundaries of diffusion models for speech generation. I will start by presenting DiffAR, an autoregressive diffusion model capable of generating high-fidelity raw speech waveforms end-to-end. By operating directly in the waveform domain and conditioning on overlapping frames, DiffAR achieves coherent, expressive, and naturally varied speech generation. Specifically, it allows the creation of local acoustic behaviors, like vocal fry, which makes the overall waveform sounds more natural.<br><br>Second, I will introduce a novel spectral analysis framework that interprets the inference process of diffusion models through a frequency-domain lens. This perspective enables principled design of noise schedules that are aligned with the spectral characteristics of the target data, replacing empirical heuristics with theoretically grounded methods.<br><br>These works were conducted in collaboration with Roi Benita and Michael Elad, and are detailed in the following papers:<p style="margin:0px;line-height:normal;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><a href="https://arxiv.org/abs/2310.01381" target="_blank"><font face="arial, sans-serif">https://arxiv.org/abs/2310.01381</font></a></p><p style="margin:0px;line-height:normal;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><a href="https://arxiv.org/abs/2502.00180" target="_blank"><font face="arial, sans-serif">https://arxiv.org/abs/2502.00180</font></a></p><p style="color:rgb(31,31,31)"><font face="arial, sans-serif"><b>Bio: </b>Joseph (Yossi) Keshet received his B.Sc. and M.Sc. degrees in Electrical Engineering from Tel Aviv University in 1994 and 2002, respectively. He completed his Ph.D. in Computer Science in 2008 at the School of Computer Engineering, The Hebrew University of Jerusalem. From 2008 to 2009, he was a postdoctoral researcher at EPFL and the IDIAP Research Institute in Switzerland. He then served as a Research Assistant Professor at TTIC from 2009 to 2012. Between 2013 and 2022, he was an Associate Professor in the Department of Computer Science at Bar-Ilan University. Since 2022, he has been an Associate Professor at the Faculty of Electrical and Computer Engineering at the Technion. His research interests include speech recognition, speech synthesis, and speech analysis.</font></p></div></div></div></div><div><div><font face="arial, sans-serif"><b>Host: </b><a href="mailto:klivescu@ttic.edu" rel="noreferrer" target="_blank"><b>Karen Livescu</b></a></font></div></div><div><br></div><br clear="all"></div><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small"><span class="gmail_default" style="font-size:small"></span>Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue, Rm 517</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL 60637</font></i><br></font></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">773-834-1757</font></i></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Apr 30, 2025 at 9:54 AM Mary Marre <<a href="mailto:mmarre@ttic.edu" target="_blank">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div><div style="font-size:small"><font style="font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b> </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit"><font style="font-family:arial,sans-serif;color:rgb(0,0,0)"> Wednes</font><span class="gmail_default" style="font-family:arial,sans-serif;color:rgb(0,0,0)">day, April 30,<span class="gmail_default"> </span>2025</span><font style="font-family:arial,sans-serif;color:rgb(0,0,0)"> at</font><b><font color="#000000" style="font-family:arial,sans-serif"> <u style="background-color:rgb(255,255,0)">10:00</u></font></b><font color="#000000"><u><b><font face="arial, sans-serif"><span style="background-color:rgb(255,255,0)"> am</span></font></b><b style="background-color:rgb(255,255,0)"><font face="arial, sans-serif"> CT </font></b></u></font></font></font></div><div><div><div><div><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b><font color="#500050" face="arial, sans-serif"><br></font></b></p><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><font face="arial, sans-serif"><b><font color="#500050">Where: </font></b><font color="#000000">Talk will be given </font><font color="#000000" style="font-weight:bold"><u>live, in-person</u></font><font style="font-weight:bold"> </font>at<br></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font color="#500050"> </font><font color="#000000"> TTIC, 6045 S. Kenwood Avenue</font></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000"> 5th Floor, <b>Room 529 </b></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><b><span style="color:black"><br></span></b></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><b style="font-family:arial,sans-serif;color:rgb(60,64,67);letter-spacing:0.2px">Virtually:</b><span style="font-family:arial,sans-serif;color:rgb(60,64,67);letter-spacing:0.2px"> </span><span style="font-family:arial,sans-serif;letter-spacing:0.2px"><font color="#3c4043"> </font></span><i style="font-family:arial,sans-serif;letter-spacing:0.2px"><font color="#000000">livestream via </font><b style="color:rgb(0,0,255)"><a href="https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=d72690c6-1754-49c1-9107-b2c90104a6ce" target="_blank">panopto</a></b></i></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><span style="letter-spacing:0.2px"><b style="font-style:italic;font-family:arial,sans-serif;color:rgb(0,0,255)"> </b></span><b style="letter-spacing:0.2px"><font size="1" face="tahoma, sans-serif"> </font></b><b style="color:rgb(60,64,67);letter-spacing:0.2px"><font face="arial, sans-serif"> </font></b></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font style="color:rgb(80,0,80);vertical-align:inherit"><font style="vertical-align:inherit"><b>Who: </b> <font color="#500050"> </font><font color="#000000"><font color="#500050"> </font></font></font></font></font>Joseph (Yossi) Keshet, Technion</p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><br></p><div style="border-top:none;border-right:none;border-left:none;border-bottom:2.25pt solid rgb(11,118,159);padding:0in 0in 1pt"></div><div><br></div></div></div><div><div dir="ltr"><b>Title:</b> From Raw Waveform to Spectrum: Practical and Theoretical Advances in Diffusion Models for Speech Generation<br><b><br></b></div><div dir="ltr"><b>Abstract:</b> In this talk, I will present two complementary contributions that push the boundaries of diffusion models for speech generation. I will start by presenting DiffAR, an autoregressive diffusion model capable of generating high-fidelity raw speech waveforms end-to-end. By operating directly in the waveform domain and conditioning on overlapping frames, DiffAR achieves coherent, expressive, and naturally varied speech generation. Specifically, it allows the creation of local acoustic behaviors, like vocal fry, which makes the overall waveform sounds more natural.<br><br>Second, I will introduce a novel spectral analysis framework that interprets the inference process of diffusion models through a frequency-domain lens. This perspective enables principled design of noise schedules that are aligned with the spectral characteristics of the target data, replacing empirical heuristics with theoretically grounded methods.<br><br>These works were conducted in collaboration with Roi Benita and Michael Elad, and are detailed in the following papers:<p style="margin:0px;line-height:normal;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><a href="https://arxiv.org/abs/2310.01381" target="_blank"><font face="arial, sans-serif">https://arxiv.org/abs/2310.01381</font></a></p><p style="margin:0px;line-height:normal;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><a href="https://arxiv.org/abs/2502.00180" target="_blank"><font face="arial, sans-serif">https://arxiv.org/abs/2502.00180</font></a></p><p style="color:rgb(31,31,31)"><font face="arial, sans-serif"><b>Bio: </b>Joseph (Yossi) Keshet received his B.Sc. and M.Sc. degrees in Electrical Engineering from Tel Aviv University in 1994 and 2002, respectively. He completed his Ph.D. in Computer Science in 2008 at the School of Computer Engineering, The Hebrew University of Jerusalem. From 2008 to 2009, he was a postdoctoral researcher at EPFL and the IDIAP Research Institute in Switzerland. He then served as a Research Assistant Professor at TTIC from 2009 to 2012. Between 2013 and 2022, he was an Associate Professor in the Department of Computer Science at Bar-Ilan University. Since 2022, he has been an Associate Professor at the Faculty of Electrical and Computer Engineering at the Technion. His research interests include speech recognition, speech synthesis, and speech analysis.</font></p></div></div></div></div><div><div><font face="arial, sans-serif"><b>Host: </b><a href="mailto:klivescu@ttic.edu" rel="noreferrer" target="_blank"><b>Karen Livescu</b></a></font></div></div><div><br></div><div><br></div><br><br clear="all"></div><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue, Rm 517</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL 60637</font></i><br></font></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">773-834-1757</font></i></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Apr 30, 2025 at 9:21 AM Mary Marre <<a href="mailto:mmarre@ttic.edu" target="_blank">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div><div style="font-size:small"><font style="font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b> </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit"><font style="font-family:arial,sans-serif;color:rgb(0,0,0)"> Wednes</font><span class="gmail_default" style="font-family:arial,sans-serif;color:rgb(0,0,0)">day, April 30,<span class="gmail_default"> </span>2025</span><font style="font-family:arial,sans-serif;color:rgb(0,0,0)"> at</font><b><font color="#000000" style="font-family:arial,sans-serif"> <u style="background-color:rgb(255,255,0)">10:00</u></font></b><font color="#000000"><u><b><font face="arial, sans-serif"><span style="background-color:rgb(255,255,0)"> am</span></font></b><b style="background-color:rgb(255,255,0)"><font face="arial, sans-serif"> CT </font></b></u></font></font></font></div><div><div><div><div><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b><font color="#500050" face="arial, sans-serif"><br></font></b></p><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><font face="arial, sans-serif"><b><font color="#500050">Where: </font></b><font color="#000000">Talk will be given </font><font color="#000000" style="font-weight:bold"><u>live, in-person</u></font><font style="font-weight:bold"> </font>at<br></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font color="#500050"> </font><font color="#000000"> TTIC, 6045 S. Kenwood Avenue</font></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000"> 5th Floor, <b>Room 529 </b></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><b><span style="color:black"><br></span></b></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><b style="font-family:arial,sans-serif;color:rgb(60,64,67);letter-spacing:0.2px">Virtually:</b><span style="font-family:arial,sans-serif;color:rgb(60,64,67);letter-spacing:0.2px"> </span><span style="font-family:arial,sans-serif;letter-spacing:0.2px"><font color="#3c4043"> </font></span><i style="font-family:arial,sans-serif;letter-spacing:0.2px"><font color="#000000">livestream via </font><b style="color:rgb(0,0,255)"><a href="https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=d72690c6-1754-49c1-9107-b2c90104a6ce" target="_blank">panopto</a></b></i></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><span style="letter-spacing:0.2px"><b style="font-style:italic;font-family:arial,sans-serif;color:rgb(0,0,255)"> </b></span><b style="letter-spacing:0.2px"><font size="1" face="tahoma, sans-serif"> </font></b><b style="color:rgb(60,64,67);letter-spacing:0.2px"><font face="arial, sans-serif"> </font></b></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font style="color:rgb(80,0,80);vertical-align:inherit"><font style="vertical-align:inherit"><b>Who: </b> <font color="#500050"> </font><font color="#000000"><font color="#500050"> </font></font></font></font></font>Joseph (Yossi) Keshet, Technion</p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><br></p><div style="border-top:none;border-right:none;border-left:none;border-bottom:2.25pt solid rgb(11,118,159);padding:0in 0in 1pt"></div><div><br></div></div></div><div><div dir="ltr"><b>Title:</b> From Raw Waveform to Spectrum: Practical and Theoretical Advances in Diffusion Models for Speech Generation<br><b><br></b></div><div dir="ltr"><b>Abstract:</b> In this talk, I will present two complementary contributions that push the boundaries of diffusion models for speech generation. I will start by presenting DiffAR, an autoregressive diffusion model capable of generating high-fidelity raw speech waveforms end-to-end. By operating directly in the waveform domain and conditioning on overlapping frames, DiffAR achieves coherent, expressive, and naturally varied speech generation. Specifically, it allows the creation of local acoustic behaviors, like vocal fry, which makes the overall waveform sounds more natural.<br><br>Second, I will introduce a novel spectral analysis framework that interprets the inference process of diffusion models through a frequency-domain lens. This perspective enables principled design of noise schedules that are aligned with the spectral characteristics of the target data, replacing empirical heuristics with theoretically grounded methods.<br><br>These works were conducted in collaboration with Roi Benita and Michael Elad, and are detailed in the following papers:<p style="margin:0px;line-height:normal;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><a href="https://arxiv.org/abs/2310.01381" target="_blank"><font face="arial, sans-serif">https://arxiv.org/abs/2310.01381</font></a></p><p style="margin:0px;line-height:normal;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><a href="https://arxiv.org/abs/2502.00180" target="_blank"><font face="arial, sans-serif">https://arxiv.org/abs/2502.00180</font></a></p><p style="color:rgb(31,31,31)"><font face="arial, sans-serif"><b>Bio: </b>Joseph (Yossi) Keshet received his B.Sc. and M.Sc. degrees in Electrical Engineering from Tel Aviv University in 1994 and 2002, respectively. He completed his Ph.D. in Computer Science in 2008 at the School of Computer Engineering, The Hebrew University of Jerusalem. From 2008 to 2009, he was a postdoctoral researcher at EPFL and the IDIAP Research Institute in Switzerland. He then served as a Research Assistant Professor at TTIC from 2009 to 2012. Between 2013 and 2022, he was an Associate Professor in the Department of Computer Science at Bar-Ilan University. Since 2022, he has been an Associate Professor at the Faculty of Electrical and Computer Engineering at the Technion. His research interests include speech recognition, speech synthesis, and speech analysis.</font></p></div></div></div></div><div><div><font face="arial, sans-serif"><b>Host: </b><a href="mailto:klivescu@ttic.edu" rel="noreferrer" target="_blank"><b>Karen Livescu</b></a></font></div></div><div><br></div><div><br></div><br clear="all"></div><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue, Rm 517</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL 60637</font></i><br></font></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">773-834-1757</font></i></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Apr 29, 2025 at 2:08 PM Mary Marre <<a href="mailto:mmarre@ttic.edu" target="_blank">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div><div style="font-size:small"><font style="font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b> </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit"><font style="font-family:arial,sans-serif;color:rgb(0,0,0)"> Wednes</font><span class="gmail_default" style="font-family:arial,sans-serif;color:rgb(0,0,0)">day, April <span>30</span>,<span class="gmail_default"> </span>2025</span><font style="font-family:arial,sans-serif;color:rgb(0,0,0)"> at</font><b><font color="#000000" style="font-family:arial,sans-serif"> <u style="background-color:rgb(255,255,0)">10:00</u></font></b><font color="#000000"><u><b><font face="arial, sans-serif"><span style="background-color:rgb(255,255,0)"> am</span></font></b><b style="background-color:rgb(255,255,0)"><font face="arial, sans-serif"> CT </font></b></u></font></font></font></div><div><div><div><div><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b><font color="#500050" face="arial, sans-serif"><br></font></b></p><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><font face="arial, sans-serif"><b><font color="#500050">Where: </font></b><font color="#000000">Talk will be given </font><font color="#000000" style="font-weight:bold"><u>live, in-person</u></font><font style="font-weight:bold"> </font>at<br></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font color="#500050"> </font><font color="#000000"> TTIC, 6045 S. Kenwood Avenue</font></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000"> 5th Floor, <b>Room 529 </b></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><b><span style="color:black"><br></span></b></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><b style="font-family:arial,sans-serif;color:rgb(60,64,67);letter-spacing:0.2px">Virtually:</b><span style="font-family:arial,sans-serif;color:rgb(60,64,67);letter-spacing:0.2px"> </span><span style="font-family:arial,sans-serif;letter-spacing:0.2px"><font color="#3c4043"> </font></span><i style="font-family:arial,sans-serif;letter-spacing:0.2px"><font color="#000000">livestream via </font><b style="color:rgb(0,0,255)"><a href="https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=d72690c6-1754-49c1-9107-b2c90104a6ce" target="_blank">panopto</a></b></i></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><span style="letter-spacing:0.2px"><b style="font-style:italic;font-family:arial,sans-serif;color:rgb(0,0,255)"> </b></span><b style="letter-spacing:0.2px"><font size="1" face="tahoma, sans-serif"> </font></b><b style="color:rgb(60,64,67);letter-spacing:0.2px"><font face="arial, sans-serif"> </font></b></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font style="color:rgb(80,0,80);vertical-align:inherit"><font style="vertical-align:inherit"><b>Who: </b> <font color="#500050"> </font><font color="#000000"><font color="#500050"> </font></font></font></font></font>Joseph (Yossi) Keshet, Technion</p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><br></p><div style="border-top:none;border-right:none;border-left:none;border-bottom:2.25pt solid rgb(11,118,159);padding:0in 0in 1pt"></div><div><br></div></div></div><div><div dir="ltr"><b>Title:</b> From Raw Waveform to Spectrum: Practical and Theoretical Advances in Diffusion Models for Speech Generation<br><b><br></b></div><div dir="ltr"><b>Abstract:</b> In this talk, I will present two complementary contributions that push the boundaries of diffusion models for speech generation. I will start by presenting DiffAR, an autoregressive diffusion model capable of generating high-fidelity raw speech waveforms end-to-end. By operating directly in the waveform domain and conditioning on overlapping frames, DiffAR achieves coherent, expressive, and naturally varied speech generation. Specifically, it allows the creation of local acoustic behaviors, like vocal fry, which makes the overall waveform sounds more natural.<br><br>Second, I will introduce a novel spectral analysis framework that interprets the inference process of diffusion models through a frequency-domain lens. This perspective enables principled design of noise schedules that are aligned with the spectral characteristics of the target data, replacing empirical heuristics with theoretically grounded methods.<br><br>These works were conducted in collaboration with Roi Benita and Michael Elad, and are detailed in the following papers:<p style="margin:0px;line-height:normal;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><a href="https://arxiv.org/abs/2310.01381" target="_blank"><font face="arial, sans-serif">https://arxiv.org/abs/2310.01381</font></a></p><p style="margin:0px;line-height:normal;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><a href="https://arxiv.org/abs/2502.00180" target="_blank"><font face="arial, sans-serif">https://arxiv.org/abs/2502.00180</font></a></p><p style="color:rgb(31,31,31)"><font face="arial, sans-serif"><b>Bio: </b>Joseph (Yossi) Keshet received his B.Sc. and M.Sc. degrees in Electrical Engineering from Tel Aviv University in 1994 and 2002, respectively. He completed his Ph.D. in Computer Science in 2008 at the School of Computer Engineering, The Hebrew University of Jerusalem. From 2008 to 2009, he was a postdoctoral researcher at EPFL and the IDIAP Research Institute in Switzerland. He then served as a Research Assistant Professor at TTIC from 2009 to 2012. Between 2013 and 2022, he was an Associate Professor in the Department of Computer Science at Bar-Ilan University. Since 2022, he has been an Associate Professor at the Faculty of Electrical and Computer Engineering at the Technion. His research interests include speech recognition, speech synthesis, and speech analysis.</font></p></div></div></div></div><div><div><font face="arial, sans-serif"><b>Host: </b><a href="mailto:klivescu@ttic.edu" rel="noreferrer" target="_blank"><b>Karen Livescu</b></a></font></div></div><div><br></div><div><br></div><div><br></div><div><br></div><br clear="all"></div><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue, Rm 517</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL 60637</font></i><br></font></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">773-834-1757</font></i></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Apr 24, 2025 at 12:17 PM Mary Marre <<a href="mailto:mmarre@ttic.edu" target="_blank">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div><div style="font-size:small"><font style="font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b> </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit"><font style="font-family:arial,sans-serif;color:rgb(0,0,0)"> Wednes</font><span class="gmail_default" style="font-family:arial,sans-serif;color:rgb(0,0,0)">day, April 30,<span class="gmail_default"> </span>2025</span><font style="font-family:arial,sans-serif;color:rgb(0,0,0)"> at</font><b><font color="#000000" style="font-family:arial,sans-serif"> <u style="background-color:rgb(255,255,0)">10:00</u></font></b><font color="#000000"><u><b><font face="arial, sans-serif"><span style="background-color:rgb(255,255,0)"> am</span></font></b><b style="background-color:rgb(255,255,0)"><font face="arial, sans-serif"> CT </font></b></u></font></font></font></div><div><div><div><div><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b><font color="#500050" face="arial, sans-serif"><br></font></b></p><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><font face="arial, sans-serif"><b><font color="#500050">Where: </font></b><font color="#000000">Talk will be given </font><font color="#000000" style="font-weight:bold"><u>live, in-person</u></font><font style="font-weight:bold"> </font>at<br></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font color="#500050"> </font><font color="#000000"> TTIC, 6045 S. Kenwood Avenue</font></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000"> 5th Floor, <b>Room 529 </b></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><b><span style="color:black"><br></span></b></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><b style="font-family:arial,sans-serif;color:rgb(60,64,67);letter-spacing:0.2px">Virtually:</b><span style="font-family:arial,sans-serif;color:rgb(60,64,67);letter-spacing:0.2px"> </span><span style="font-family:arial,sans-serif;letter-spacing:0.2px"><font color="#3c4043"> </font></span><i style="font-family:arial,sans-serif;letter-spacing:0.2px"><font color="#000000">livestream via </font><b style="color:rgb(0,0,255)"><a href="https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=d72690c6-1754-49c1-9107-b2c90104a6ce" target="_blank">panopto</a></b></i></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><span style="letter-spacing:0.2px"><b style="font-style:italic;font-family:arial,sans-serif;color:rgb(0,0,255)"> </b></span><b style="letter-spacing:0.2px"><font size="1" face="tahoma, sans-serif"> </font></b><b style="color:rgb(60,64,67);letter-spacing:0.2px"><font face="arial, sans-serif"> </font></b></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font style="color:rgb(80,0,80);vertical-align:inherit"><font style="vertical-align:inherit"><b>Who: </b> <font color="#500050"> </font><font color="#000000"><font color="#500050"> </font></font></font></font></font>Joseph<span style="background-color:rgb(255,255,255)"> (Yossi)</span> Keshet, Technion</p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><br></p><div style="border-top:none;border-right:none;border-left:none;border-bottom:2.25pt solid rgb(11,118,159);padding:0in 0in 1pt"></div><div><br></div></div></div><div><div dir="ltr"><b>Title:</b> From Raw Waveform to Spectrum: Practical and Theoretical Advances in Diffusion Models for Speech Generation<br><b><br></b></div><div dir="ltr"><b>Abstract:</b> In this talk, I will present two complementary contributions that push the boundaries of diffusion models for speech generation. I will start by presenting DiffAR, an autoregressive diffusion model capable of generating high-fidelity raw speech waveforms end-to-end. By operating directly in the waveform domain and conditioning on overlapping frames, DiffAR achieves coherent, expressive, and naturally varied speech generation. Specifically, it allows the creation of local acoustic behaviors, like vocal fry, which makes the overall waveform sounds more natural.<br><br>Second, I will introduce a novel spectral analysis framework that interprets the inference process of diffusion models through a frequency-domain lens. This perspective enables principled design of noise schedules that are aligned with the spectral characteristics of the target data, replacing empirical heuristics with theoretically grounded methods.<br><br>These works were conducted in collaboration with Roi Benita and Michael Elad, and are detailed in the following papers:<p style="margin:0px;line-height:normal;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><a href="https://arxiv.org/abs/2310.01381" target="_blank"><font face="arial, sans-serif">https://arxiv.org/abs/2310.01381</font></a></p><p style="margin:0px;line-height:normal;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><a href="https://arxiv.org/abs/2502.00180" target="_blank"><font face="arial, sans-serif">https://arxiv.org/abs/2502.00180</font></a></p><p style="color:rgb(31,31,31)"><font face="arial, sans-serif"><b>Bio: </b>Joseph (Yossi) Keshet received his B.Sc. and M.Sc. degrees in Electrical Engineering from Tel Aviv University in 1994 and 2002, respectively. He completed his Ph.D. in Computer Science in 2008 at the School of Computer Engineering, The Hebrew University of Jerusalem. From 2008 to 2009, he was a postdoctoral researcher at EPFL and the IDIAP Research Institute in Switzerland. He then served as a Research Assistant Professor at TTIC from 2009 to 2012. Between 2013 and 2022, he was an Associate Professor in the Department of Computer Science at Bar-Ilan University. Since 2022, he has been an Associate Professor at the Faculty of Electrical and Computer Engineering at the Technion. His research interests include speech recognition, speech synthesis, and speech analysis.</font></p></div></div></div></div><div><div><font face="arial, sans-serif"><b>Host: </b><a href="mailto:klivescu@ttic.edu" rel="noreferrer" target="_blank"><b>Karen Livescu</b></a></font></div></div><div><br></div><div><br></div><br clear="all"></div><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue, Rm 517</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL 60637</font></i><br></font></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">773-834-1757</font></i></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div></div>
</div>
</blockquote></div></div>
</blockquote></div></div>
</blockquote></div></div>
</blockquote></div></div>
</blockquote></div></div>