[Colloquium] Emily Wenger Dissertation Defense/Apr 10, 2023

Fri Apr 7 08:47:50 CDT 2023

This is an announcement of Emily Wenger's Dissertation Defense.
===============================================
Candidate: Emily Wenger

Date: Monday, April 10, 2023

Time:  1:30 pm CST

Remote Location: https://uchicago.zoom.us/j/4571855061?pwd=d1BURDhucEVOR1JjaVY2V2cvRENWZz09<https://urldefense.com/v3/__https://uchicago.zoom.us/j/4571855061?pwd=d1BURDhucEVOR1JjaVY2V2cvRENWZz09__;!!BpyFHLRN4TMTrA!5jCQUIxvKncWYEG4vQX-FIjkG5GyFiBoIZsqq4XxBtxNtNiC2bn4ustaIMjMab5pcPrJLI0HDfH6oWcX-e3iGoJ9EUY0DiBe4sGQ$>   Meeting ID: 457 185 5061 Passcode: 645188

Location: JCL 298

Title: Reclaiming Data Agency in the Age of Ubiquitous Machine Learning

Abstract: As machine learning (ML) models have grown in size and scope in recent years, so has the amount of data needed to train them. Unfortunately, individuals whose data is used in large-scale ML models may face unwanted consequences. Such data use may violate individuals' privacy or enroll them in an unwanted ML application. Furthermore, recent advances have greatly enhanced models' ability to generate synthetic data like text and images. This has unleashed a fresh wave of privacy and intellectual property concerns, as generative models can memorize and regurgitate their training data, and are trained on massive datasets scraped from the internet.

While user data privacy issues are well-recognized in the ML research community, most attempts to address it take a model-centric approach. Existing solutions assume that model trainers are well-intended and that data has been taken with consent, or that data use is inevitable and that the best path forward is to mitigate privacy risks.These solutions work but overlook a significant problem: often data is taken without consent, and users do not trust model trainers.

This begs the question: what if data use was not inevitable? What if, instead, users had agency over how and if their data is used in ML systems? This thesis argues that data agency, the ability to know and control how and if one's data is used in ML systems, is an important complement to existing ML data privacy protection approaches. Such agency would shift the current power dynamic, which renders users helpless at the hands of model creators, and help users control their digital destinies. Solutions of this nature would accentuate current work on data privacy, giving users, not just model trainers, control over how their data is used.

This thesis explores solutions that provide users with data agency against large-scale ML systems, allowing individuals to disrupt or discover when their data is used in large-scale ML systems. It proposes three solutions that prevent or trace data use in ML systems or, in extreme cases, directly attack the ML system. It focuses on the use case of large-scale facial recognition (FR) systems, a machine learning technology that has recently become a flashpoint for civil liberties and privacy issues. With this use case in mind, the thesis finally develops a framework for reasoning about broadly about FR data agency.

Advisors: Ben Zhao and Heather Zheng

Committee Members: Ben Zhao, Heather Zheng, Yuxin Chen, and Aloni Cohen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20230407/d6c170b1/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: proposal.pdf
Type: application/pdf
Size: 7652745 bytes
Desc: proposal.pdf
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20230407/d6c170b1/attachment-0001.pdf>