[Colloquium] TTIC Colloquium: Geoff Hinton, University of Toronto & Google

Fri Jan 16 11:16:58 CST 2015

When:     Friday, January 23, 2015 at 4:00pm

Where:    TTIC, 6045 S Kenwood Avenue, 5th Floor, Room 526

Who:       Geoff Hinton, University of Toronto & Google

Title:       "What's wrong with convolutional nets?".

Abstract:

I will describe a new type of neural net that uses small
groups of neurons called capsules.  A capsule outputs the  probability
that an instance of the kind of entity it detects is currently present
but it also outputs a generalized pose vector that represents
attributes of the instantiated entity such as its precise position,
orientation, scale, colour, deformation, motion etc.  A capsule
receives input vectors from lower-level capsules, transforms them with
weight matrices, and looks for sharp agreement among a subset of the
transformed vectors. It then produces an output vector that is the
average of the transformed vectors that agree and an output probability that
is determined by the sharpness of the agreement. This "Hough"
non-linearity is very different from the ones standardly used in
artificial neural nets and it has a very attractive property called
"coincidence filtering". For example, a capsule can recognize a
familiar shape by detecting agreement between the predictions that
different parts of the shape make for the generalized pose of the
whole shape and it can completely ignore irrelevant predictions. This
way of recognizing shapes generalizes naturally to very different
viewpoints because all the poses change in the same way so the
coincidence remains.

The output of a shape-detecting capsule is initially shared among many
higher-level capsules to detect larger, more complex shapes.
Higher-level capsules that converge on a tight cluster then demand a
larger share of the output from the  lower-level capsules that
contribute to the cluster and smaller share from the lower level
capsules that produce outliers.   I shall argue that this
"routing-by-agreement" is a much better way of routing each part of
the visual input to those high-level neurons that know how to deal
with it than "max-pooling", which is the primitive routing mechanism
currently used in convolutional neural nets. Stochastic gradient
descent in a deep hierarchy of convolutional capsules that use
adaptive routing should allow us to parse images using deep
parts-based models that do not require hand-engineering. (Joint work
with Navdeep Jaitly).

Host:  David McAllester, mcallester at ttic.edu

-- 
*Dawn Ellis*
Administrative Coordinator,
Bookkeeper
773-834-1757
dellis at ttic.edu

TTIC
6045 S. Kenwood Ave.
Chicago, IL. 60637
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20150116/80fb8a0d/attachment.htm