• DocumentCode
    1784890
  • Title

    Network-constrained forest for regularized omics data classification

  • Author

    Andel, Michael ; Klema, Jiri ; Krejcik, Zdenek

  • Author_Institution
    Dept. of Comput. Sci., Czech Tech. Univ., Prague, Czech Republic
  • fYear
    2014
  • fDate
    2-5 Nov. 2014
  • Firstpage
    410
  • Lastpage
    417
  • Abstract
    Contemporary molecular biology deals with a wide and heterogeneous set of measurements to model and understand underlying biological processes including complex diseases. Machine learning provides a frequent approach to build such models. However, the models built solely from measured data often suffer from overfitting, as the sample size is typically much smaller than the number of measured features. In this paper, we propose a random forest-based classifier that minimizes this overfitting with the aid of prior knowledge in the form of a feature interaction network. We illustrate the proposed method in the task of disease classification based on measured mRNA and miRNA profiles complemented by the interaction network composed of the miRNA-mRNA target relations and mRNA-mRNA interactions corresponding to the interactions between their encoded proteins. We demonstrate that the proposed network-constrained forest employs prior knowledge to increase learning bias and consequently to improve classification accuracy, stability and comprehensibility of the resulting model. The experiments are carried out in the domain of myelodysplastic syndrome that we are concerned about in the long term. We validate our approach in the public domain of ovarian carcinoma, with the same data form. We believe that the idea of a network-constrained forest can straightforwardly be generalized towards arbitrary omics data with an available and non-trivial feature interaction network.
  • Keywords
    RNA; biochemistry; bioinformatics; cancer; classification; decision trees; feature extraction; gynaecology; learning (artificial intelligence); medical computing; molecular biophysics; patient diagnosis; proteins; random processes; sampling methods; arbitrary omics data; biological process; classification accuracy; complex disease; contemporary molecular biology; disease classification; encoded protein interaction; learning bias; mRNA profile measurement; mRNA-mRNA interaction; machine learning; measured feature number; miRNA profile measurement; miRNA-mRNA target relation; model overfitting minimization; molecular biology measurement; myelodysplastic syndrome; network-constrained forest; nontrivial feature interaction network; omics data classification regularization; ovarian carcinoma; prior knowledge; public domain; random forest-based classifier; sample size; Accuracy; Diseases; Gene expression; Proteins; Radio frequency; Vegetation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
  • Conference_Location
    Belfast
  • Type

    conf

  • DOI
    10.1109/BIBM.2014.6999193
  • Filename
    6999193