[Colloquium] Reminder: Tchoua/Dissertation Defense/Jul 2, 2019

Margaret Jaffey margaret at cs.uchicago.edu
Mon Jul 1 09:08:55 CDT 2019


This is a reminder about Roselyne Tchoua's dissertation defense
tomorrow.

       Department of Computer Science/The University of Chicago

                     *** Dissertation Defense ***


Candidate:  Roselyne Tchoua

Date:  Tuesday, July 2, 2019

Time:  11:00 AM

Place:  John Crerar Library (JCL) 298

Title: HYBRID HUMAN-MACHINE SCIENTIFIC INFORMATION EXTRACTION
FRAMEWORKS

Abstract:
A wealth of valuable research data is locked within the millions of
research articles published every year. Reading and extracting
pertinent information from those articles has become an unmanageable
task for scientists. Moreover, these data are loosely structured,
encoded in manuscripts of various formats, embedded in different
content types, and are, in general, not machine accessible. Thus,
studies that automatically leverage this valuable information are not
tractable or even possible. Current approaches employ humans to
manually extract data, define extraction rules, or annotate training
corpora for machine learning approaches through tedious,
time-consuming, error-prone and sometimes expensive processes. In the
specific case of scientific information extraction, the need for
pointed expertise increases costs and decreases the generalization of
extraction methods. This thesis studies hybrid human-computer
techniques for liberating scientific facts (entities and relations),
focusing in particular on leveraging computer and human strengths to
alleviate the burden on human curators thereby also decreasing costs.
The emerging field of materials informatics has the potential to
greatly reduce time-to-market and development costs for new materials.
Such efforts rely on access to large databases of material properties
and therefore represent a suitable but not unique application for this
research. This work addresses the challenge of populating a database
of scientific facts by presenting three frameworks with different
levels of automation and human involvment. Specifically, three
approaches involves varying amount of untrained, trained and expert
input in order to populate a database of polymer properties. DB, is a
crowdsourcing framework, which employs and assists a semiexpert crowd
to extract an important relation in polymer science. Increasing the
automation and targeting a different relation, the Tg framework uses a
variety of computer and human modules to supplement the output of a
well performing natural language software and prioritize expert
curation. Having identified, named scientific named entity recognition
as a major challenge and pre-requisite for relations extraction,
polyNER, the third framework uses minimal, focused expert knowledge to
generate annotated entity-rich corpora data and bootstrap scientific
named entities classifiers. This work shows that systems combining
existing software and minimal human input can achieve state-of-the-art
domain-specific Natural Language Processing software and demonstrates
the potential of hybrid human-computer partnership alternatives to
sometimes impractical state-of-the-art approaches. Given the lack of
generalizable automatic extraction methods, such approach are
particularly valuable to bootstrap the extraction process for emerging
groups with specialized, previously unexplored information extraction
needs.

Roselyne's advisor is Prof. Ian Foster

Login to the Computer Science Department website for details,
including a draft copy of the dissertation:

 https://newtraell.cs.uchicago.edu/phd/phd_announcements#roselyne

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Margaret P. Jaffey            margaret at cs.uchicago.edu
Department of Computer Science
Student Support Rep (Ry 156)               (773) 702-6011
The University of Chicago      http://www.cs.uchicago.edu
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


More information about the Colloquium mailing list