[Colloquium] TTI-C Colloquium: Beata Beigman Klebanov, Northwestern

Mon Nov 23 09:48:17 CST 2009

When:             *Monday, Nov 23 @ 1:00pm*

Where:           * TTI-C Conference Room #526*, 6045 S Kenwood Ave

Who:               *Beata Beigman Klebanov*, Northwestern University

Title:          *      **Learning with Annotation Noise*

The success of supervised machine learning depends crucially on the quality
of the labeled data. In computational linguistics, labeled data often come
from human annotators  analyzing texts for some latent content, like
syntactic structures or the ideological perspective of the author.

In such annotation projects, it is often assumed that the noise in the
annotated data is random, hence not much work is devoted to identifying the
noisy instances and assessing their impact on learning.

Building on a case study of patterns of disagreements between human
annotators in a binary classification task, we articulate an annotation
generation model that allows a theoretical
discussion of the type of noise introduced into the data through the
annotation process.

We show that this noise is different from both random classification noise
and from malicious noise, and discuss its properties as far as learning is
concerned.

We then generalize the model of annotation generation and show that it fits
well an existing dataset recently used in a benchmark. We conclude with a
discussion of the implications of the
findings for benchmarking practices in computational linguistics and
elsewhere.

Contact:          Joseph Keshet, TTI-C
jkeshet at tti-c.org<nati at tti-c.org><nati at tti-c.org>
  834-6850
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20091123/ac1de010/attachment.htm