[CS] Alexander Brace MS Presentation/Jul 11, 2025
via cs
cs at mailman.cs.uchicago.edu
Mon Jun 30 09:35:34 CDT 2025
This is an announcement of Alexander Brace's MS Presentation
===============================================
Candidate: Alexander Brace
Date: Friday, July 11, 2025
Time: 9 am CST
Remote Location: https://uchicago.zoom.us/j/93581481621?pwd=huiWPkKNwe2RatnTxZsG0KHJIb6CSL.1
Location: JCL 346
Title: Fast homology detection across 250M proteins with deep learning
Abstract: Traditional tools for biological sequence comparison have struggled to scale with the exponential increase in data volume achieved by next-generation sequencing technology. A key aspect of sequence comparison is identifying biological features across different organisms that share a common evolutionary ancestry, generally referred to as homology detection. Seen through the lens of protein sequences, homology detection plays a central role in understanding their function and evolution. In this study, we demonstrate how protein language models and vector search enable fast and accurate homology detection across databases exceeding 250 million sequences. We investigate model scaling relationships under different modes of embedding vector compression. Notably, we find that homology search using binary embeddings, despite a loss of information, is both faster and more accurate than cosine similarity search in single-precision. Leveraging binary vector search, we provide a precomputed search index for the entire UniProtKB (Swiss-Prot and TrEMBL), enabling rapid protein homology detection for both single sequences (14 s) and whole bacterial genomes (80 s). Our findings provide a scalable path to achieve billion-scale biological sequence analysis with artificial intelligence to improve function annotation and the discovery of new evolutionary relationships.
Advisors:Ian Foster, Arvind Ramanathan
Committee Members: Ian Foster, Arvind Ramanathan, Rick Stevens, Christopher Henry
More information about the cs
mailing list