[Colloquium] TALK TODAY: Sharon Goldwater

Fri Oct 8 08:23:59 CDT 2010

DEPARTMENT OF COMPUTER SCIENCE

UNIVERSITY OF CHICAGO

Date: Friday, October 8, 2010
Time: 3:00 p.m.
Place: TBD

----------------------------------------------

Speaker:		Sharon Goldwater

From:		University of Edinburgh

Web page:	http://homepages.inf.ed.ac.uk/sgwater/

Title: 		From sounds to words: Bayesian modeling of early language acquisition

Abstract:  

The child learning language is faced with a difficult problem: given a set of specific linguistic observations, the learner must infer some abstract representation (a grammar) that generalizes correctly to novel observations and productions. In this talk, I argue that Bayesian computational models provide a principled way to examine the kinds of representations, biases, and sources of information that lead to successful learning. As an example, I discuss my work on modeling word segmentation. I first present a computational study exploring the effects of context on statistical word segmentation.  In this study, a model that assumes words are statistically independent (as in the stimuli used in many human experiments) is compared to a model that
defines words as units that help to predict following words. I show that the context-independent model undersegments the data, while the contextual model yields much more accurate segmentations, outperforming previous models on realistic corpus data. This difference suggests the need to consider contextual effects in infant word segmentation.

Simulations using corpus data provide insight into the kinds of information that are useful for learning, but it is also important to address the question of whether model predictions are consistent with human learning patterns.  In the second part of this talk, I present results from experiments using artificial language stimuli similar to those of Saffran et al. (1996), but where the stimuli were varied between subjects to modify the difficulty of the task.  The Bayesian model described above correlates better with human patterns of
difficulty than any other model tested, suggesting that this model does indeed capture important properties of human segmentation.