[Colloquium] REMINDER: 4/8 Talks at TTIC: Yun William Yu, University of Toronto

Mary Marre mmarre at ttic.edu
Thu Apr 7 14:49:35 CDT 2022


*When:*        Friday, April 8th at* 11:00 am CT*


*Where:       *Talk will be given *live, in-person* at

                   TTIC, 6045 S. Kenwood Avenue

                   5th Floor, Room 530


*Where:*       Zoom Virtual Talk (*register in advance here
<https://uchicagogroup.zoom.us/webinar/register/WN_M1hFlSY5R9SypAaY7cqEtw>*)


*Who: *         Yun William Yu, University of Toronto




*Title:          *Compressive Hash-Based Feature Selection in Bio(Medical)
Informatics


*Abstract: *The selection of subsampled features from sets is one of the
primitive tasks enabling efficient biomedical algorithms. One of the
classical approaches is to apply some hash function to the set and keep
only the minimum hashed values; with slight variations in context, this
gives rise to both MinHash, a probabilistic sketch for computing Jaccard
index between sets, and minimizers, a k-mer selection scheme for finding
sparse anchors along genomic sequences. More recently, open sync-mers were
introduced in the literature as an alternative to minimizers, and they turn
out to have some nice theoretical properties.

In this talk, we cover a couple related topics. First, we discuss
applications of MinHash to federated clinical queries and show that lossily
compressing MinHash buckets using a floating-point encoding reduces
space-complexity from O(log n) to O(log log n). Second, we carefully
analyze open sync-mers and prove an optimal choice of parameters for open
sync-mers under a point mutation k-mer conservation model, and show that
these choices can improve read mapping chaining scores. Time permitting, we
may discuss some additional theoretical connections between minimum-hashing
based methods and other modern approaches to feature selection, but that
may be a stretch goal.


No knowledge of genomics or medical informatics will be needed to follow
this talk.

Joint work with Jim Shaw and Griffin Weber.

*Bio: *William Yu is an assistant professor of mathematics at the
University of Toronto. He trained under Bonnie Berger at MIT for his PhD,
and was a postdoc at Harvard Medical School with Griffin Weber.

*Host: **Avrim Blum* <avrim at ttic.edu>




Mary C. Marre
Faculty Administrative Support
*Toyota Technological Institute*
*6045 S. Kenwood Avenue*
*Chicago, IL  60637*
*mmarre at ttic.edu <mmarre at ttic.edu>*


On Fri, Apr 1, 2022 at 4:45 PM Mary Marre <mmarre at ttic.edu> wrote:

> *When:*        Friday, April 8th at* 11:00 am CT*
>
>
> *Where:       *Talk will be given *live, in-person* at
>
>                    TTIC, 6045 S. Kenwood Avenue
>
>                    5th Floor, Room 530
>
>
> *Where:*       Zoom Virtual Talk (*register in advance here
> <https://uchicagogroup.zoom.us/webinar/register/WN_M1hFlSY5R9SypAaY7cqEtw>*
> )
>
>
> *Who: *         Yun William Yu, University of Toronto
>
>
>
>
> *Title:          *Compressive Hash-Based Feature Selection in
> Bio(Medical) Informatics
>
>
> *Abstract: *The selection of subsampled features from sets is one of the
> primitive tasks enabling efficient biomedical algorithms. One of the
> classical approaches is to apply some hash function to the set and keep
> only the minimum hashed values; with slight variations in context, this
> gives rise to both MinHash, a probabilistic sketch for computing Jaccard
> index between sets, and minimizers, a k-mer selection scheme for finding
> sparse anchors along genomic sequences. More recently, open sync-mers were
> introduced in the literature as an alternative to minimizers, and they turn
> out to have some nice theoretical properties.
>
> In this talk, we cover a couple related topics. First, we discuss
> applications of MinHash to federated clinical queries and show that lossily
> compressing MinHash buckets using a floating-point encoding reduces
> space-complexity from O(log n) to O(log log n). Second, we carefully
> analyze open sync-mers and prove an optimal choice of parameters for open
> sync-mers under a point mutation k-mer conservation model, and show that
> these choices can improve read mapping chaining scores. Time permitting, we
> may discuss some additional theoretical connections between minimum-hashing
> based methods and other modern approaches to feature selection, but that
> may be a stretch goal.
>
>
> No knowledge of genomics or medical informatics will be needed to follow
> this talk.
>
> Joint work with Jim Shaw and Griffin Weber.
>
> *Bio: *William Yu is an assistant professor of mathematics at the
> University of Toronto. He trained under Bonnie Berger at MIT for his PhD,
> and was a postdoc at Harvard Medical School with Griffin Weber.
>
> *Host: **Avrim Blum* <avrim at ttic.edu>
>
>
>
>
>
> Mary C. Marre
> Faculty Administrative Support
> *Toyota Technological Institute*
> *6045 S. Kenwood Avenue*
> *Chicago, IL  60637*
> *mmarre at ttic.edu <mmarre at ttic.edu>*
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20220407/8b6d6202/attachment.html>


More information about the Colloquium mailing list