<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>

</head>

<body dir="ltr">

<div class="elementToProof" style="color: rgb(0, 0, 0);"><span style="font-family: Helvetica; font-size: 12px;">This is an announcement of Emma Peterson's MS Presentation</span><span style="font-family: Calibri, Helvetica, sans-serif; font-size: 11pt;"><br>

</span><span style="font-family: Helvetica; font-size: 12px;">===============================================</span><span style="font-family: Calibri, Helvetica, sans-serif; font-size: 11pt;"><br>

</span><span style="font-family: Helvetica; font-size: 12px;">Candidate: Emma Peterson</span><span style="font-family: Calibri, Helvetica, sans-serif; font-size: 11pt;"><br>

<br>

</span><span style="font-family: Helvetica; font-size: 12px;">Date: Friday, April 19, 2024</span><span style="font-family: Calibri, Helvetica, sans-serif; font-size: 11pt;"><br>

<br>

</span><span style="font-family: Helvetica; font-size: 12px;">Time: 11 am CT</span><span style="font-family: Calibri, Helvetica, sans-serif; font-size: 11pt;"><br>

<br>

</span><span style="font-family: Helvetica; font-size: 12px;">Location: JCL 356</span><span style="font-family: Calibri, Helvetica, sans-serif; font-size: 11pt;"><br>

<br>

</span><span style="font-family: Helvetica; font-size: 12px;">Title: MARI: A Usable, Mostly Automated Redaction Interface for Unstructured Text</span><span style="font-family: Calibri, Helvetica, sans-serif; font-size: 11pt;"><br>

<br>

</span><span style="font-family: Helvetica; font-size: 12px;">Abstract: Social and behavioral science researchers often collect datasets of unstructured text, such as conversation transcripts. To benefit science, they often wish to share these datasets, yet

 first must redact identifiable information to protect participants' privacy. Doing so at scale requires mostly automated redaction tools that retain a dataset's utility by not over-redacting. To better understand requirements for such tools, we first interviewed

 ten data stewards about how they redact their own datasets, their threat models, and the information that would make their participants identifiable. They articulated nuanced conceptions of reidentifiability and the need to redact more than just direct personal

 identifiers. In response, we designed MARI, a human-in-the-loop redaction tool. Whereas existing tools focus on pattern matching (e.g., for addresses) and named-entity recognition, MARI incorporates additional linguistic features, a knowledge base, and language

 models to suggest redactions. Furthermore, MARI introduces a novel graphical workflow in which data stewards quickly evaluate proposed redactions. In an ablation study and comparison to three commercial tools and one academic tool, we evaluate MARI's redactions

 on a public dataset of caregiver-child conversations and a synthetic dataset representing data stewards' additional concerns. We find that MARI suggests a number of redactions existing tools miss, especially in demographic and linguistic categories.</span><span style="font-family: Calibri, Helvetica, sans-serif; font-size: 11pt;"><br>

<br>

</span><span style="font-family: Helvetica; font-size: 12px;">Advisors: Blase Ur</span><span style="font-family: Calibri, Helvetica, sans-serif; font-size: 11pt;"><br>

<br>

</span></div>

<div class="elementToProof" style="font-family: Helvetica; font-size: 12px; color: rgb(0, 0, 0);">

Committee Members: Blase Ur, Marshini Chetty, and Alexander Kale</div>

<div id="Signature">

<div style="background-color: rgb(255, 255, 255); font-family: Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">

<br>

</div>

</div>

</body>

</html>