[Colloquium] CS Seminar today at 2:30 pm: Bill Howe, University of Washington

Sandra Wallace via Colloquium colloquium at mailman.cs.uchicago.edu
Wed Apr 4 09:35:20 CDT 2018


UNIVERSITY OF CHICAGO
DEPARTMENT OF COMPUTER SCIENCE
PRESENTS


Bill Howe 
University of Washington


Wednesday, April 4, 2018 at 2:30 pm 
Ryerson 251


Title: Algorithmic Curation at Scale: Putting Open Science Data to Work

Abstract:
Data in public repositories remains remarkably underused despite significant investments in open science. Making data available online turns out to be the easy part; making the data usable for science requires new algorithms to enable longitudinal, integrative analysis rather than just settling for keyword search.

We consider three core problems:  Poor quality metadata in science due to the cost of expert human curation, semantic heterogeneity across scientific datasets, and the lack of automation to enable reproducibility at scale.

To solve these problems, we combine distant supervision and co-learning methods to provide high-quality labels with zero training data, and show that this approach outperforms even the state-of-the-art (and expensive) supervised methods.  We then use statistical claims extracted from the text of scientific papers to disambiguate schema mappings across disparate datasets.   Finally, we automate experiments to verify extracted claims against the integrated data, to help researchers, journal editors, and curators hold scientists accountable for weakly reproducible results.

These approaches are already beginning to have impact: computational biologists are beginning to use our curated gene expression corpus as the gold standard to search for new cancer treatments, and social scientists are using our curated corpus of scientific figures to understand and optimize how researchers use visualization to communicate (an emerging field we call "viziometrics.")  

I’ll talk about these projects, then make a broader argument in favor of a research agenda in “epistemic databases” — systems that take a more active role in ensuring the validity (and ethical use) of conclusions drawn from data they manage.  Such systems will support features to ensure reproducibility, prevent p-hacking, balance privacy with transparency, enforce compliance with relevant laws, and combat algorithmic discrimination.  I’ll talk about some motivating applications in urban analytics across homelessness and transportation, and some ongoing work to develop these features using "synthetic samples" as a core primitive.


Bio:
Bill Howe is Associate Professor in the Information School, Adjunct Associate Professor in Computer Science & Engineering and Electrical Engineering at University of Washington. His research interests are in data management, curation, analytics, and visualization in the sciences. As Founding Associate Director of the UW eScience Institute, Howe played a leadership role in the Data Science Environment program at UW through a $32.8 million grant awarded jointly to UW, NYU, and UC Berkeley. With support from the MacArthur Foundation and Microsoft, Howe directs the Urbanalytics group at UW and UW's participation in the Cascadia Urban Analytics Cooperative with the University of British Columbia, where he focuses on responsible data-intensive urban science. He founded the UW Data Science Masters Degree and serves as its inaugural Program Director and Faculty Chair. He has received two Jim Gray Seed Grant awards from Microsoft Research for work on managing environmental data, has had two papers selected for VLDB Journal's "Best of Conference" issues (2004 and 2010), and co-authored what are currently the most-cited papers from both VLDB 2010 and SIGMOD 2012. Howe serves on the program and organizing committees for a number of conferences in the area of databases and scientific data management, developed a first MOOC on data science that attracted over 200,000 students across two offerings, and founded UW's Data Science for Social Good program. He has a Ph.D. in Computer Science from Portland State University and a Bachelor's degree in Industrial & Systems Engineering from Georgia Tech.

Host:  Ian Foster

Refreshments served after the talk in Ry. 255

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20180404/676ddcd5/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-2.tiff
Type: image/tiff
Size: 44790 bytes
Desc: not available
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20180404/676ddcd5/attachment-0001.tiff>


More information about the Colloquium mailing list