<div dir="ltr"><div dir="ltr"><div class="gmail_default"><div><p style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;color:rgb(80,0,80);margin:0px"><font face="arial, sans-serif" color="#000000"><font style="vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b>    </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit">    Thursday, March 10th at <b style="background-color:rgb(255,255,0)">11am CT</b></font></font><br></font></p><p style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;color:rgb(80,0,80);margin:0px"><br></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font style="color:rgb(0,0,0);vertical-align:inherit"><font style="vertical-align:inherit"><b>Where:</b>      <b> </b></font></font><b><font color="#000000">Zoom Virtual Talk</font></b><font color="#500050"> (</font><a href="https://uchicagogroup.zoom.us/webinar/register/WN_ABZkmKvrRmiHQsow5E7jmw" target="_blank"><b><font color="#0000ff">register in advance here</font></b></a><font style="color:rgb(80,0,80)">)</font></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><br></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font style="vertical-align:inherit"><font style="vertical-align:inherit"><font color="#000000"><b>Who: </b> </font><font color="#500050">    </font><font color="#000000">    <span>SouYoung</span> Jin, Massachusetts Institute of Technology</font></font></font></font></p></div><div><br></div><div><font color="#000000"><b style="font-family:arial,sans-serif">Title:</b><span style="font-family:arial,sans-serif">          </span>Cross-Modal Learning for Video Understanding<br></font></div></div><div class="gmail_default"><p><font color="#000000"><font face="arial, sans-serif"><b>Abstract:  </b></font>Videos are good sources of knowledge about things we have not yet experienced. They also show many aspects of human life. Videos have multiple sources of sensory information. Building a video understanding system requires computer vision components, such as object detection and recognition, and knowledge from other domains such as spoken/natural language processing and cognitive science. Cross-modal learning is a way of learning that involves information obtained from more than one modality. In this talk, I will introduce two recent projects on cross-modal learning for video understanding. In particular, I will talk about the "Spoken Moments" project, where my collaborators and I collected spoken descriptions of 500K short videos, to capture natural and concise descriptions on a large scale. We designed the study to collect only descriptions of events that stood out in participants’ memory, as we were particularly interested in the video content that human annotators pay attention to. Using pairs of video and spoken descriptions, we trained a model with a cross-modal learning architecture to understand the video content, leading to more human-like understanding. The model trained on the spoken moments generalizes very strongly to the other datasets. I will also present our approaches to model training and future projects in video understanding.</font></p><p><font face="arial, sans-serif"><b><font color="#000000">Bio:</font></b><b style="color:rgb(60,64,67)">  </b><font color="#000000"><span>SouYoung</span> Jin is</font></font><font color="#000000"> a postdoctoral associate at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at the Massachusetts Institute of Technology (MIT), working with Dr. Aude Oliva. Previously she earned her PhD in College of Information and Computer Sciences (CICS), University of Massachusetts, Amherst (UMass Amherst), where she researched on improving face clustering in videos under Dr. Erik Learned-Miller in the Computer Vision Lab. Her main research area is in Computer Vision, Machine Learning and Cognitive Science.</font></p><p><font color="#000000"><b style="font-family:arial,sans-serif">H</b><span class="gmail_default" style="font-family:arial,sans-serif"><b>ost: <a href="mailto:greg@ttic.edu" target="_blank">Greg </a></b></span></font><span style="font-family:Roboto,RobotoDraft,Helvetica,Arial,sans-serif;font-size:0.875rem;font-weight:bold;letter-spacing:0.2px;white-space:nowrap"><a href="mailto:greg@ttic.edu" target="_blank"><font color="#000000">Shakhnarov</font><font color="#202124">ich</font></a></span></p><font color="#888888"><p style="color:rgb(60,64,67)"><br></p><p style="color:rgb(60,64,67)"><br></p></font></div><div><div dir="ltr" data-smartmail="gmail_signature"><div dir="ltr"><b style="background-color:rgb(255,255,255)"><font color="#3d85c6">Brandie Jones </font></b><div><div><i><b style="background-color:rgb(255,255,255)"><font color="#3d85c6">Faculty Administrative Support</font></b></i></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6">Toyota Technological Institute</font></span></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6">6045 S. Kenwood Avenue</font></span></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6">Chicago, IL  60637</font></span></div></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6"><a href="http://www.ttic.edu" target="_blank">www.ttic.edu</a> </font></span></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Mar 2, 2022 at 1:49 PM Brandie Jones <<a href="mailto:bjones@ttic.edu" target="_blank">bjones@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div><div><p style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;color:rgb(80,0,80);margin:0px"><font face="arial, sans-serif" color="#000000"><font style="vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b>    </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit">    Thursday, March 10th at <b style="background-color:rgb(255,255,0)">11am CT</b></font></font><br></font></p><p style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;color:rgb(80,0,80);margin:0px"><br></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font style="color:rgb(0,0,0);vertical-align:inherit"><font style="vertical-align:inherit"><b>Where:</b>      <b> </b></font></font><font><b><font color="#000000">Zoom Virtual Talk</font></b><font color="#500050"> (</font></font><a href="https://uchicagogroup.zoom.us/webinar/register/WN_ABZkmKvrRmiHQsow5E7jmw" target="_blank"><b><font color="#0000ff">register in advance here</font></b></a><font style="color:rgb(80,0,80)">)</font></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><br></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font style="vertical-align:inherit"><font style="vertical-align:inherit"><font color="#000000"><b>Who: </b> </font><font color="#500050">    </font><font color="#000000">    SouYoung Jin, Massachusetts Institute of Technology</font></font></font></font></p></div><div><br></div><div><b style="font-family:arial,sans-serif;color:rgb(60,64,67)">Title:</b><span style="font-family:arial,sans-serif;color:rgb(60,64,67)">          </span>Cross-Modal Learning for Video Understanding<br></div></div><div><p style="color:rgb(60,64,67)"><font face="arial, sans-serif"><b>Abstract:  </b></font><span style="color:rgb(34,34,34)">Videos are good sources of knowledge about things we have not yet experienced. They also show many aspects of human life. Videos have multiple sources of sensory information. Building a video understanding system requires computer vision components, such as object detection and recognition, and knowledge from other domains such as spoken/natural language processing and cognitive science. Cross-modal learning is a way of learning that involves information obtained from more than one modality. In this talk, I will introduce two recent projects on cross-modal learning for video understanding. In particular, I will talk about the "Spoken Moments" project, where my collaborators and I collected spoken descriptions of 500K short videos, to capture natural and concise descriptions on a large scale. We designed the study to collect only descriptions of events that stood out in participants’ memory, as we were particularly interested in the video content that human annotators pay attention to. Using pairs of video and spoken descriptions, we trained a model with a cross-modal learning architecture to understand the video content, leading to more human-like understanding. The model trained on the spoken moments generalizes very strongly to the other datasets. I will also present our approaches to model training and future projects in video understanding.</span></p><p style="color:rgb(60,64,67)"><font face="arial, sans-serif"><b>Bio:  </b>SouYoung Jin is</font><span style="color:rgb(34,34,34)"> a postdoctoral associate at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at the Massachusetts Institute of Technology (MIT), working with Dr. Aude Oliva. Previously she earned her PhD in College of Information and Computer Sciences (CICS), University of Massachusetts, Amherst (UMass Amherst), where she researched on improving face clustering in videos under Dr. Erik Learned-Miller in the Computer Vision Lab. Her</span><span style="color:rgb(34,34,34)"> main research area is in Computer Vision, Machine Learning and Cognitive Science.</span></p><p style="color:rgb(60,64,67)"><b style="font-family:arial,sans-serif;color:rgb(34,34,34)">H</b><span class="gmail_default" style="font-family:arial,sans-serif;color:rgb(34,34,34)"><b>ost: <a href="mailto:greg@ttic.edu" target="_blank">Greg </a></b></span><span style="color:rgb(32,33,36);font-family:Roboto,RobotoDraft,Helvetica,Arial,sans-serif;font-size:0.875rem;font-weight:bold;letter-spacing:0.2px;white-space:nowrap"><a href="mailto:greg@ttic.edu" target="_blank">Shakhnarovich</a></span></p><p style="color:rgb(60,64,67)"><br></p><table cellpadding="0" role="presentation" style="border-collapse:collapse;margin-top:0px;width:auto;font-family:Roboto,RobotoDraft,Helvetica,Arial,sans-serif;font-size:14px;letter-spacing:0.2px;display:block;color:rgb(119,119,119)"></table></div><div><div dir="ltr"><div dir="ltr"><b><font color="#3d85c6">Brandie Jones </font></b><br></div></div></div></div><div><div dir="ltr"><div dir="ltr"><div><div><i><b style="background-color:rgb(255,255,255)"><font color="#3d85c6">Faculty Administrative Support</font></b></i></div><div><span style="color:rgb(61,133,198)">Toyota Technological Institute</span><br></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6">6045 S. Kenwood Avenue</font></span></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6">Chicago, IL  60637</font></span></div></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6"><a href="http://www.ttic.edu" target="_blank">www.ttic.edu</a> </font></span></div></div></div></div></div>

</blockquote></div></div>