[Theory] TODAY: [Talks at TTIC] 11/6 Young Researcher Seminar Series: Yossi Gandelsman, UC Berkeley

Brandie Jones via Theory theory at mailman.cs.uchicago.edu
Wed Nov 6 09:00:00 CST 2024


*When:    *Wednesday, November 6th* at **11AM CT*

*Where:   *Talk will be given *live, in-person* at

                    TTIC, 6045 S. Kenwood Avenue

                    5th Floor, Room 530


*Virtually: *via Panopto (Livestream
<https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=0eaa4077-f605-4c22-9d4a-b1e800f9ac24>
)

*Who:      *Yossi Gandelsman, UC Berkeley

*Title:*       Reverse Engineering CLIP

*Abstract: *In this talk, I reverse engineer the computation inside CLIP,
one of the most commonly used computer vision backbones. I analyze how
individual model components affect the final CLIP representation. I show
that the image representation can be decomposed as a sum across individual
image patches, model layers, neurons, and attention heads, and use CLIP’s
text representation to interpret the summands.
When interpreting the attention heads, each head role can be characterized
by automatically finding text representations that span its output space,
which reveals property-specific roles for many heads (e.g. location or
shape). Next, interpreting the image patches uncovers an emergent spatial
localization within CLIP. Finally, the automatic description of the
contributions of individual neurons shows polysemantic behavior - each
neuron corresponds to multiple, often unrelated, concepts (e.g. ships and
cars).
The gained understanding of different components allows three main
applications: First, the discovered head roles enable the removal of
spurious features from CLIP. Second, emergent localization is used for a
strong zero-shot image segmenter. Finally, the extracted neuron polysemy
allows the mass production of “semantic” adversarial examples by generating
images with concepts spuriously correlated to the incorrect class. The
results indicate that a scalable understanding of transformer models is
attainable and can be used to detect model bugs, repair them, and improve
them.

*Bio:* Yossi Gandelsman is a PhD student at Berkeley AI Research (UC
Berkeley), advised by Prof. Alexei Efros, and a visiting researcher at FAIR
(Meta). They work on computer vision and deep learning problems.
Previously, they were a member of the Perception Team at Google Research
(now Google-DeepMind). They completed their M.Sc. at the Faculty of
Mathematics and Computer Science of the Weizmann Institute of Science,
advised by Prof. Michal Irani, and a B.Sc. at the Open University of Israel.

*Host: Shiry Ginosar <shiry at ttic.edu>*

--
*Brandie Jones *
*Executive **Administrative Assistant*
Toyota Technological Institute
6045 S. Kenwood Avenue
Chicago, IL  60637
www.ttic.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20241106/3714f71e/attachment.html>


More information about the Theory mailing list