[Colloquium] Ozan Gokdemir MS Presentation/Mar 28, 2024

meganwoodward at uchicago.edu meganwoodward at uchicago.edu
Fri Mar 15 08:29:29 CDT 2024


This is an announcement of Ozan Gokdemir's MS Presentation
===============================================
Candidate: Ozan Gokdemir

Date: Thursday, March 28, 2024

Time:  1 pm CT

Location: JCL 298

Title: Retrieval-Augmented Scientific Hypothesis Generation

Abstract: Staying abreast of the latest developments is crucial for scientists to advance their fields through novel hypotheses. The rapid rate of advancement in scientific knowledge, however, renders it insurmountable for any individual scientist to navigate the deluge of information. The National Science Foundation reports a more than 50-fold increase in science and engineering articles published annually in open-access journals in the past two decades, going from 19,000 in 2003 to 992,000 in 2022. At this scale, computational tools are needed to assist scientists in synthesizing a wide array of recent findings to form the next generation of scientific hypotheses.

As the driving force behind the recent transformative advances in Natural Language Processing, Large-Language Models (LLMs) emerge as strong candidates for powering an AI-driven scientific hypothesis generation engine. Despite their utility in associative memory and knowledge synthesis, LLMs often lack up-to-date knowledge and context-specific information, hindering their ability to produce novel and factually grounded hypotheses. Retrieval Augmented Generation (RAG) recently emerged as a method to remedy these shortcomings by equipping an LLM with a corpus of relevant documents that it can reference on-the-fly in order to answer questions. Within the RAG framework, a reference corpus can be maintained up to date by adding newly available information and removing outdated content. Furthermore, it renders the LLM responses more grounded as the LLM can cite sources that inform its responses.  

Contributions we aim to make with this thesis are threefold. First, we provide a survey of the state of the art in RAG techniques; second, we assess RAG's applicability to scientific hypothesis generation in domains such as low-dose radiation biology, SARS-CoV-2 protein interactions, and antimicrobial peptide generation; third, we present preliminary results from a comparative analysis contrasting the factuality, relevance and breadth of retrieval-augmented hypotheses from a model against those solely informed by the baseline parametric memory of the same model. 

Advisors: Rick Stevens

Committee Members: Ian Foster, Arvind Ramanathan, and Rick Stevens



More information about the Colloquium mailing list