[Colloquium] Sellie/Dissertation Defense/5-16-08

Wed Apr 30 09:34:46 CDT 2008

			Department of Computer Science/The University of Chicago

					*** Dissertation Defense ***

Candidate:  Linda Sellie

Date:  Friday, May 16, 2008 (tentative*)

Time and Location:  10:00 a.m. in Ry 276 (tentative*)

Title:  Learning Random DNF over the Uniform Distribution

Abstract:
This thesis presents a new result for exactly learning the average  
monotone c log(n)-
DNF formulae and average c log(n)-DNF formulae from random examples  
drawn from
the uniform distribution.

Learning DNF with a DNF hypothesis has often been called one of the  
most im-
portant problems in computational learning theory. Valiant who founded  
the ﬁeld of
computational learning theory stated in his seminal paper [?] “Such  
expressions ap-
pear particularly easy for humans to comprehend. Hence we expect that  
any practical
learning system would have to allow for them.” Yet learning DNF by  
DNF has proved
elusive; learning k-DNF by k-DNF is NP-hard even with membership  
queries and un-
der the uniform distribution [?]. In this thesis we prove that despite  
the fact that this
class is NP-hard to learn, the average member of this class where the  
terms are of a
ﬁxed size is easy to exactly learn - even when only provided with  
random examples
from the uniform distribution.

We show that randomly generated monotone c log(n)-DNF formulae and  
randomly
generated c log(n)-DNF formulas can be exactly learned in  
probabilistic polynomial
time over the uniform distribution. Our notion of randomly generated  
is with respect
to a uniform distribution.

To prove this we identify the class of wel l behaved c log(n)-DNF  
formulae, and the
prominent property of terms in a DNF-formula. We show that almost  
every monotone
DNF formula and almost every DNF formula is well-behaved, and is  
composed entirely
of prominent terms. We prove that there exists a probabilistic Turing  
machine that
exactly learns the prominent terms of well behaved c log(n)-DNF  
formula. This thesis
presents the ﬁrst algorithm for exactly learning a large class of DNF  
formulae when
only given access to labeled examples drawn from the uniform  
distribution.

A key contribution of this thesis is to show that there exists a  
simple test that ﬁnds
subsets of terms in most k-DNF formulae (monotone or non-monotone.)

Candidate's Advisor:  Prof. Stuart A. Kurtz

A draft copy of Ms. Sellie's dissertation is available in Ry 156.

*  Note: The day and time of Linda Sellie's defense is tentatively  
scheduled for Friday, May 16th at 10:00 a.m.  It is possible that her  
defense will be held on a different day during the week of May 12th  
and at a different time.  If the day or time changes, a revised  
announcement will be sent out.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Margaret P. Jaffey                             margaret at cs.uchicago.edu
Department of Computer Science
Student Support Rep (Ry 156)        (773) 702-6011
The University of Chicago                  http://www.cs.uchicago.edu
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=