[CS] Emma Peterson MS Presentation/Apr 19, 2024

meganwoodward at uchicago.edu meganwoodward at uchicago.edu
Mon Apr 15 13:48:44 CDT 2024

This is an announcement of Emma Peterson's MS Presentation
Candidate: Emma Peterson

Date: Friday, April 19, 2024

Time: 11 am CT

Location: JCL 356

Title: MARI: A Usable, Mostly Automated Redaction Interface for Unstructured Text

Abstract: Social and behavioral science researchers often collect datasets of unstructured text, such as conversation transcripts. To benefit science, they often wish to share these datasets, yet first must redact identifiable information to protect participants' privacy. Doing so at scale requires mostly automated redaction tools that retain a dataset's utility by not over-redacting. To better understand requirements for such tools, we first interviewed ten data stewards about how they redact their own datasets, their threat models, and the information that would make their participants identifiable. They articulated nuanced conceptions of reidentifiability and the need to redact more than just direct personal identifiers. In response, we designed MARI, a human-in-the-loop redaction tool. Whereas existing tools focus on pattern matching (e.g., for addresses) and named-entity recognition, MARI incorporates additional linguistic features, a knowledge base, and language models to suggest redactions. Furthermore, MARI introduces a novel graphical workflow in which data stewards quickly evaluate proposed redactions. In an ablation study and comparison to three commercial tools and one academic tool, we evaluate MARI's redactions on a public dataset of caregiver-child conversations and a synthetic dataset representing data stewards' additional concerns. We find that MARI suggests a number of redactions existing tools miss, especially in demographic and linguistic categories.

Advisors: Blase Ur

Committee Members: Blase Ur, Marshini Chetty, and Alexander Kale

More information about the cs mailing list