Title :
Multiscale Binarization of Gene Expression Data for Reconstructing Boolean Networks
Author :
Hopfensitz, M. ; Mussel, C. ; Wawra, C. ; Maucher, M. ; Kuhl, M. ; Neumann, H. ; Kestler, H.A.
Author_Institution :
Res. Group of Bioinf. & Syst. Biol., Ulm Univ., Ulm, Germany
Abstract :
Network inference algorithms can assist life scientists in unraveling gene-regulatory systems on a molecular level. In recent years, great attention has been drawn to the reconstruction of Boolean networks from time series. These need to be binarized, as such networks model genes as binary variables (either "expressed” or "not expressed”). Common binarization methods often cluster measurements or separate them according to statistical or information theoretic characteristics and may require many data points to determine a robust threshold. Yet, time series measurements frequently comprise only a small number of samples. To overcome this limitation, we propose a binarization that incorporates measurements at multiple resolutions. We introduce two such binarization approaches which determine thresholds based on limited numbers of samples and additionally provide a measure of threshold validity. Thus, network reconstruction and further analysis can be restricted to genes with meaningful thresholds. This reduces the complexity of network inference. The performance of our binarization algorithms was evaluated in network reconstruction experiments using artificial data as well as real-world yeast expression time series. The new approaches yield considerably improved correct network identification rates compared to other binarization techniques by effectively reducing the amount of candidate networks.
Keywords :
Boolean functions; binary sequences; biology computing; genetics; inference mechanisms; microorganisms; molecular biophysics; time series; Boolean networks; binarization; binary variables; gene expression data; gene-regulatory systems; multiscale binarization; network inference; network inference algorithms; yeast expression time series; Approximation error; Bioinformatics; Complexity theory; Computational biology; Gene expression; Time measurement; Time series analysis; Binarization; Boolean networks; gene-regulatory networks; reconstruction.; Algorithms; Computational Biology; Databases, Genetic; Gene Expression Profiling; Gene Regulatory Networks; Models, Genetic; Saccharomyces cerevisiae;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2011.62