[Colloquium] Research at TTIC: Stefan Canzar, TTIC

Mary Marre mmarre at ttic.edu
Fri May 22 09:59:39 CDT 2015


When:     Friday, May 29th at noon

Where:    TTIC, 6045 S Kenwood Avenue, 5th Floor, Room 526

Who:        Stefan Canzar, TTIC

Title:       Algorithmic Challenges in Next-Generation Sequencing Puzzles


*Abstract:*

Next-generation sequencing (NGS) technology allows us to rapidly sequence
many millions of DNA molecules and has been used to address a wide range of
fundamental biological questions. Every short DNA sequence ('read')
generated
by NGS instruments carries little information by itselft, and thus the
reconstruction
of the desired biological measurement involves solving a complex
computational
puzzle.

One of the most popular assays sequences complementary DNA copies of
RNA transcripts (RNA-seq). Current methods that try to reconstruct the
cellular
transcriptome from RNA-seq data, however, suffer from various
simplifications
made to cope with the combinatorial complexity of the problem. As a
consequence,
many false-negative isoforms are due to splice junctions that remain
undiscovered
by the preceding read alignment step.

We therefore propose a novel method that integrates transcript discovery
and quantificiation
based on the complete set of isoforms, that is, independent of discovered
splice junctions.
After an initial prediction our algorithm digs deeper into the space of all
possible isoforms
through an iterative scheme. In each iteration, a transcript that improves
prediction is determined
by solving an optimization problem, yielding a boost in recall, especially
among low-expressed transcripts.

If a reference genome is available for the sequenced organism, the relative
location of the puzzle
pieces can be determined by mapping the reads back to their origin in the
reference genome.
Sequencing errors, repetitive regions, and genuine differences between
reference and donor
genome lead to ambiguous mappings which often imply false positive
predictions of polymorphisms.
In the second part of my talk I will formulate the problem of resolving
multi-mapping reads as the
maximum facility location problem, for which we propose LP-rounding
heuristics. We provide a
theoretic guarantee on the quality of the solution and demonstrate the
utility of our algorithm in
resolving conflicting deletions implied by reads mapping ambiguously to
Craig Venter's genome model.



***************************************
Research at TTIC Seminar Series

TTIC is hosting a weekly seminar series presenting the research currently
underway at the Institute. Every week a different TTIC faculty member will
present their research.  The lectures are intended both for students
seeking research topics and adviser, and for the general TTIC and
University of Chicago communities interested in hearing what their
colleagues are up to.

To receive announcements about the seminar series, please subscribe to the
mailing list: https://groups.google.com/a/ttic.edu/group/talks/subscribe

Speaker details can be found at: http://www.ttic.edu/tticseminar.php.

For additional questions, please contact David McAllester at
mcallester at ttic.edu





Mary C. Marre
Administrative Assistant
*Toyota Technological Institute*
*6045 S. Kenwood Avenue*
*Room 504*
*Chicago, IL  60637*
*p:(773) 834-1757 <%28773%29%20834-1757>*
*f: (773) 357-6970 <%28773%29%20357-6970>*
*mmarre at ttic.edu <mmarre at ttic.edu>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20150522/20030636/attachment.htm 


More information about the Colloquium mailing list