[Colloquium] 4/1 TTIC Colloquium: Devi Parikh, Virginia Tech

Fri Mar 25 14:32:18 CDT 2016

*When: *    Friday, April 1st at 11:00 a.m.

*Where:*    TTIC, 6045 S Kenwood Avenue, 5th Floor, Room 526

*Speaker:*  Devi Parikh, Virginia Tech

*Title:       *Words, Pictures, and Common Sense

*Abstract:*
Wouldn't it be nice if machines could understand content in images and
communicate this understanding as effectively as humans? Such technology
would be immensely powerful, be it for aiding a visually-impaired user
navigate a world built by the sighted, assisting an analyst in extracting
relevant information from a surveillance feed, educating a child playing a
game on a touch screen, providing information to a spectator at an art
gallery, or interacting with a robot. As computer vision and natural
language processing techniques are maturing, we are closer to achieving
this dream than we have ever been.

In this talk, I will present two ongoing thrusts in my lab that push the
boundaries of AI capabilities at the intersection of vision, language, and
commonsense reasoning.

Visual Question Answering (VQA): I will describe the task, our dataset (the
largest and most complex of its kind), and our model for free-form and
open-ended Visual Question Answering (VQA). Given an image and a natural
language question about the image (e.g., “What kind of store is this?”,
“How many people are waiting in the queue?”, “Is it safe to cross the
street?”), the machine’s task is to automatically produce an accurate
natural language answer (“bakery”, “5”, “Yes”). We have collected and
recently released a dataset containing >250,000 images, >760,000 questions,
and ~10 million answers. Our dataset is enabling the next generation of AI
systems, often based on deep learning techniques, for understanding images
and language, and performing complex reasoning; in our lab and the
community at large.

Learning Common Sense Through Visual Abstraction: Common sense is a key
ingredient in building intelligent machines that make "human-like"
decisions when performing tasks -- be it automatically answering natural
language questions, or understanding images and videos. How can machines
learn this common sense? While some of this knowledge is explicitly stated
in human-generated text (books, articles, blogs, etc.), much of this
knowledge is unwritten. While unwritten, it is not unseen! The visual world
around us is full of structure bound by commonsense laws. But machines
today cannot learn common sense directly by observing our visual world
because they cannot accurately perform detailed visual recognition in
images and videos. This leads to a chicken-and-egg problem: we would like
to learn common sense to allow machines to understand images accurately,
but in order to learn common sense, we need accurate image parsing. We
argue that the solution is to give up on photorealism. We propose to
leverage abstract scenes -- cartoon scenes made from clip art by crowd
sourced humans -- to teach our machines common sense.

*Bio:*

Devi Parikh is an Assistant Professor in the Bradley Department of
Electrical and Computer Engineering at Virginia Tech (VT), and an Allen
Distinguished Investigator of Artificial Intelligence. She leads the
Computer Vision Lab at VT, and is also a member of the Virginia Center for
Autonomous Systems (VaCAS) and the VT Discovery Analytics Center (DAC).

Prior to this, she was a Research Assistant Professor at Toyota
Technological Institute at Chicago (TTIC), an academic computer science
institute affiliated with University of Chicago. She has held visiting
positions at Cornell University, University of Texas at Austin, Microsoft
Research, MIT, and Carnegie Mellon University. She received her M.S. and
Ph.D. degrees from the Electrical and Computer Engineering department at
Carnegie Mellon University in 2007 and 2009 respectively. She received her
B.S. in Electrical and Computer Engineering from Rowan University in 2005.

Her research interests include computer vision, pattern recognition, and AI
in general, and visual recognition problems in particular. Her recent work
involves leveraging human-machine collaboration for building smarter
machines, and exploring problems at the intersection of vision and
language. She has also worked on other topics such as ensemble of
classifiers, data fusion, inference in probabilistic models, 3D reassembly,
barcode segmentation, computational photography, interactive computer
vision, contextual reasoning, hierarchical representations of images, and
human-debugging.

She is a recipient of an NSF CAREER award, a Sloan Research Fellowship, an
Office of Naval Research (ONR) Young Investigator Program (YIP) award, an
Army Research Office (ARO) Young Investigator Program (YIP) award, an Allen
Distinguished Investigator Award in Artificial Intelligence from the Paul
G. Allen Family Foundation, three Google Faculty Research Awards, an
Outstanding New Assistant Professor award from the College of Engineering
at Virginia Tech, and a Marr Best Paper Prize awarded at the International
Conference on Computer Vision (ICCV).

http://computing.ece.vt.edu/~parikh/

Host: Greg Shakhnarovich,   greg at ttic.edu

For more information on the colloquium series or to subscribe to the
mailing list, please see http://www.ttic.edu/colloquium.php

Mary C. Marre
Administrative Assistant
*Toyota Technological Institute*
*6045 S. Kenwood Avenue*
*Room 504*
*Chicago, IL  60637*
*p:(773) 834-1757*
*f: (773) 357-6970*
*mmarre at ttic.edu <mmarre at ttic.edu>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20160325/8eca9b54/attachment-0001.htm