[Colloquium] CS Seminar today: Sanjay Krishnan, UC Berkeley

Sandra Wallace via Colloquium colloquium at mailman.cs.uchicago.edu
Tue Feb 20 08:39:09 CST 2018


UNIVERSITY OF CHICAGO
DEPARTMENT OF COMPUTER SCIENCE
PRESENTS



Sanjay Krishnan
University of California, Berkeley


Tuesday, February 20, 2018 at 3:30 pm 
Ryerson 251


Title:  The Statistics of Dirty Data

Abstract:
A statistical model is only as good as its training data. Systematic errors can arise when data are integrated from untrustworthy sources, collected in mixed formats, or contain inconsistent references of the same real-world entities. This talk describes the classical relational database topic of "data cleaning", i.e., the process of transforming the data to remove such issues, from a modern statistical perspective. My talk emphasizes two central themes: (1) analyzing data cleaning algorithms using statistical theory regarding sample-complexity and generalization and (2) building data cleaning systems for emerging statistical machine learning and AI applications. My results include new error bounds for query processing after data cleaning, learning-theoretic models for understanding the accuracy of data transformation rules on unseen data, and experimental results on the design of scalable data cleaning systems deployed in applications ranging from real-time robot learning to investigative journalism. I conclude by describing our ongoing effort on a system called AlphaClean, which leverages reinforcement learning to synthesize data cleaning programs for very unstructured data cleaning problems. 

 
Bio:
Sanjay Krishnan is a Computer Science PhD candidate in the RISELab and in the AUTOLAB (Berkeley Laboratory for Automation Science and Engineering) at UC Berkeley. His research studies problems at the intersection of database theory, machine learning, and robotics. Sanjay's work has received a number of awards including the 2016 SIGMOD Best Demonstration award, 2015 IEEE GHTC Best Paper award, and Sage Scholar award. 

Website: https://www.ocf.berkeley.edu/~sanjayk/ <https://www.ocf.berkeley.edu/~sanjayk/>


Host:  Aaron Elmore

Refreshments served after the talk in Ry. 255

Link to PDF:  https://www.cs.uchicago.edu/sites/cs/files/uploads/seminar_announcements/Krishnanposter.pdf <https://www.cs.uchicago.edu/sites/cs/files/uploads/seminar_announcements/Krishnanposter.pdf>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20180220/cff40f88/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-3.png
Type: image/png
Size: 23025 bytes
Desc: not available
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20180220/cff40f88/attachment-0001.png>


More information about the Colloquium mailing list