[Colloquium] Reminder - Mansi Sakarvadia MS Presentation/Apr 5, 2024

Fri Apr 5 08:18:43 CDT 2024

This is an announcement of Mansi Sakarvadia's MS Presentation
===============================================
Candidate: Mansi Sakarvadia

Date: Friday, April 05, 2024

Time:  2 pm CT

Remote Location: https://uchicago.zoom.us/j/9643461977?pwd=VUJhRVNiQ0lHMWdGUVVNRE9rOUdNQT09 Meeting  ID: 964 346 1977 Passcode: 109461

Location: JCL 390

Title: Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models

Abstract: Answering multi-hop reasoning questions requires retrieving and synthesizing information from diverse sources. Language models (LMs) struggle to perform such reasoning consistently. Here we propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on LM attention heads. First, we analyze the per-layer activations of GPT-2 models in response to single- and multi-hop prompts. We then propose a mechanism that allows users to inject relevant prompt-specific information, which we refer to as "memories," at critical LM locations during inference. By thus enabling the LM to incorporate additional relevant information during inference, we enhance the quality of multi-hop prompt completions. We empirically show that a simple, efficient, and targeted memory injection into a key attention layer can often increase the probability of the desired next token in multi-hop tasks, by up to 424%. From this work, we observe that small subsets of attention heads can significantly impact the model prediction during multi-hop reasoning. To more faithfully interpret these heads, we develop Attention Lens: an open source tool that translates the outputs of attention heads into vocabulary tokens via learned transformations called lenses. We demonstrate the use of lenses to reveal how a model arrives at its answer and use them to localize sources of model failures such as in the case of biased and malicious language generation.

Advisors: Ian Foster and Kyle Chard

Committee Members: Ian Foster, Kyle Chard, and Ari Holtzman

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20240405/6058d510/attachment.html>