NYAS Conferences
New York Academy of Sciences
left end
Search
divider divider feedback right end
Annals of the New York Academy of Sciences Annals of the New York Academy of Sciences login

Main

Browse Volumes

Forthcoming Volumes

Annals PrePrints

Annals Extra

E-mail Alerts

Subscriptions & Orders

New Proposals

Author Guidelines

About Annals

Help

Get free Annals volume as a NYAS member: http://www.nyas.org/annalsreaderhw
Reverse Engineering Biological Networks: Opportunities and Challenges in Computational Methods for Pathway Inference Volume 1115 published November 2007
Ann. N.Y. Acad. Sci. 1115: 178–202 (2007). doi: 10.1196/annals.1407.020
Copyright © 2007 by the New York Academy of Sciences
description | purchase volume purchase this volume

This Volume
Table of Contents
Description
This Article
Full Text
Full Text (PDF)
All Versions of this Article:
annals.1407.020v1
1115/1/178    most recent
Services
Similar articles in this journal
Similar articles in PubMed
Alert me to new issues of the journal
Download to citation manager
Citing Articles
Citing Articles via Google Scholar
Google Scholar
Articles by KUNDAJE, A.
Articles by LESLIE, C.
Search for Related Content
PubMed
PubMed Citation
Articles by KUNDAJE, A.
Articles by LESLIE, C.

Part V. Some Reverse Engineering Algorithms

Learning Regulatory Programs That Accurately Predict Differential Expression with MEDUSA

ANSHUL KUNDAJEa, STEVE LIANOGLOUa, XUEJING LIb, DAVID QUIGLEYc, MARTA ARIASd, CHRIS H. WIGGINSe, LI ZHANGf AND CHRISTINA LESLIEg

Departments of a Computer Science, b Physics, c Biomedical Informatics, d Center for Computational Learning Systems, and Departments of e Applied Physics and Applied Mathematics, and f Environmental Health Sciences, Columbia University, New York, New York, USA, g Computational Biology Program, Memorial Sloan-Kettering Cancer Center, New York, New York, USA

Key Words: systems biology • gene regulation • regulatory program • machine learning • boosting • gene expression • regulatory networks • DNA damage • hypoxia • yeast stress response

Address for correspondence: Christina Leslie, Computational Biology Program; Memorial Sloan-Kettering Cancer Center, 1275 York Ave, Box 460, New York, NY 10065: Voice: 646-888-2762; fax: 646-422-0717.  cleslie{at}cbio.mskcc.org

Inferring gene regulatory networks from high-throughput genomic data is one of the central problems in computational biology. In this paper, we describe a predictive modeling approach for studying regulatory networks, based on a machine learning algorithm called MEDUSA. MEDUSA integrates promoter sequence, mRNA expression, and transcription factor occupancy data to learn gene regulatory programs that predict the differential expression of target genes. Instead of using clustering or correlation of expression profiles to infer regulatory relationships, MEDUSA determines condition-specific regulators and discovers regulatory motifs that mediate the regulation of target genes. In this way, MEDUSA meaningfully models biological mechanisms of transcriptional regulation. MEDUSA solves the problem of predicting the differential (up/down) expression of target genes by using boosting, a technique from statistical learning, which helps to avoid overfitting as the algorithm searches through the high-dimensional space of potential regulators and sequence motifs. Experimental results demonstrate that MEDUSA achieves high prediction accuracy on held-out experiments (test data), that is, data not seen in training. We also present context-specific analysis of MEDUSA regulatory programs for DNA damage and hypoxia, demonstrating that MEDUSA identifies key regulators and motifs in these processes. A central challenge in the field is the difficulty of validating reverse-engineered networks in the absence of a gold standard. Our approach of learning regulatory programs provides at least a partial solution for the problem: MEDUSA's prediction accuracy on held-out data gives a concrete and statistically sound way to validate how well the algorithm performs. With MEDUSA, statistical validation becomes a prerequisite for hypothesis generation and network building rather than a secondary consideration.






footerLeft footerRight