[CS] Alexander Brace MS Presentation/Jul 11, 2025

via cs cs at mailman.cs.uchicago.edu
Mon Jun 30 09:35:34 CDT 2025


This is an announcement of Alexander Brace's MS Presentation
===============================================
Candidate: Alexander Brace

Date: Friday, July 11, 2025

Time:  9 am CST

Remote Location: https://uchicago.zoom.us/j/93581481621?pwd=huiWPkKNwe2RatnTxZsG0KHJIb6CSL.1

Location: JCL 346

Title: Fast homology detection across 250M proteins with deep learning

Abstract: Traditional tools for biological sequence comparison have struggled to scale with the exponential increase in data volume achieved by next-generation sequencing technology. A key aspect of sequence comparison is identifying biological features across different organisms that share a common evolutionary ancestry, generally referred to as homology detection. Seen through the lens of protein sequences, homology detection plays a central role in understanding their function and evolution. In this study, we demonstrate how protein language models and vector search enable fast and accurate homology detection across databases exceeding 250 million sequences. We investigate model scaling relationships under different modes of embedding vector compression. Notably, we find that homology search using binary embeddings, despite a loss of information, is both faster and more accurate than cosine similarity search in single-precision. Leveraging binary vector search, we provide a precomputed search index for the entire UniProtKB (Swiss-Prot and TrEMBL), enabling rapid protein homology detection for both single sequences (14 s) and whole bacterial genomes (80 s). Our findings provide a scalable path to achieve billion-scale biological sequence analysis with artificial intelligence to improve function annotation and the discovery of new evolutionary relationships.

Advisors:Ian Foster, Arvind Ramanathan

Committee Members: Ian Foster, Arvind Ramanathan, Rick Stevens, Christopher Henry



More information about the cs mailing list