[Colloquium] Shi/Dissertation Defense/11-3-2008

Margaret Jaffey margaret at cs.uchicago.edu
Mon Oct 20 14:26:33 CDT 2008


			Department of Computer Science/The University of Chicago

					   *** Dissertation Defense ***


Candidate:  Xinghua (Mindy) Shi

Date:  Monday, November 3, 2008

Time and Location:  9:30 a.m. in RI 405 (Research Institutes)

Title:  System and Tools to Support a Bayesian Approach to Improving  
Large-Scale Metabolic Models

Absract:
With the rapid availability of hundreds to thousands of sequenced  
genomes,
the construction of genome-scale metabolic models for these organisms  
has
attracted much attention. Although current genome/pathway databases
provide a large proportion of metabolic information that can be used
directly to build metabolic models, there are still a number of problems
that introduce network holes and thus make these models incomplete.
Network holes occur when the network is disconnected and certain
metabolites cannot be produced or consumed. A number of factors can lead
to network holes such as missing genes, incorrect or missing  
annotations,
poor mappings from functions to biochemical reactions.

Up to now, manual search for candidates to fill network holes is still
dominating in the construction of genome-scale metabolic models. Because
of this time-consuming and labor-intensive manual work, only two dozen
such models are published. In order to construct a genome-scale  
metabolic
model for hundreds to thousands of organisms available, it is desirable
that computational approaches be applied to accelerate the model- 
building
process.

Toward the automatic reconstruction of metabolic models, we propose  
STeAM,
a system and tools to support a Bayesian approach to improving
genome-scale metabolic models.  An infrastructure that incorporates all
computational tools is built to enable experiments with computational
tools and schemes in STeAM. First, a set of tools is designed to  
integrate
and reconcile different data from a variety of databases, namely,  a
genomic database, the SEED; a genomic and pathway database, KEGG; and a
database of published genome-scale metabolic-models, BiGG. Next, network
connectivity is analyzed and network holes are detected.

With the aim of filling network holes, various data from databases are
organized, computed, and processed to prepare for the construction of
reaction predictors that can generate candidate hole-filling  
reactions. In
total, a collection of 23 types of evidence is extracted from databases
the SEED, KEGG and BiGG. This topological and biological evidence can be
categorized as follows. (i) At the gene level, three types of
evidence are collected from 560 complete genomes in SEED, including the
gene co-occurrency, the gene co-occurrency in gene clusters, and the
co-occurrency of gene-genes pairs in gene clusters. (ii) At the
reaction level, ten types of evidence are collected: reaction priors and
reaction co-occurrency in five data resources. These five data resources
are the reconstructed iJR904 and iSB619 models in BiGG, the reference
pathway map in KEGG, network modules of KEGG, 736 organism maps in KEGG,
and 560 draft models in the SEED. (iii) At the segment level,
segment priors and the co-occurrency of reaction-segment pairs are
extracted five data resources as in the reaction level. After evidence  
is
obtained from existing databases, 23 individual predictors are created
to use this evidence based on Bayesian approaches. Then, in order to
combine these individual predictors and unify their predictive  
results, an
ensemble of individual predictors is built on majority vote and four
classifiers: Naive Bayes Classifier, Bayesian Network, Multilayer
Perceptron Network and AdaBoost.

Three sets of experiments are performed to train and test individual
predictors and integrative mechanisms of single predictors, and  
eventually
evaluate the performance of the system and computational tools. The  
first
set of experiments involves self-consistency check of of two  
reconstructed
iJR904 and iSB619 models by dealing with ``Knockout and Recover'' of
core metabolic subnetwork. The second set of experiments focuses on how
the deletion of different parts of a model, where the deletion is either
totally random or based on connected subgraphs of the model, affects the
recovery ability of computational methods. The third set of experiments
involves using a new genome-scale metabolic model for C.
acetobutylicum as a test model by improving its draft model from the
SEED. The thorough analysis of various data and new results gained from
experiments not only provide insight into the properties of metabolic
networks, but also reveals the meanings and relationships among  
different
date entities. Moreover, these newly discovered knowledge can feedback  
to
existing data resources and enhance our current knowledge of genome
annotations and metabolic models.

Candidate's Advisor:  Prof. Rick Stevens

A draft copy of Ms. Shi's dissertation will be available in Ry 156.


=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Margaret P. Jaffey                             margaret at cs.uchicago.edu
Department of Computer Science
Student Support Rep (Ry 156)        (773) 702-6011
The University of Chicago                  http://www.cs.uchicago.edu
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=





More information about the Colloquium mailing list