• DocumentCode
    1828290
  • Title

    Ensemble Feature Selection Methods for a Better Regularization of the Lasso Estimate in P >> N Gene Expression Datasets

  • Author

    Aloraini, Adel

  • Author_Institution
    Comput. Sci. Dept., Qassim Univ., Qassim, Saudi Arabia
  • Volume
    2
  • fYear
    2013
  • fDate
    4-7 Dec. 2013
  • Firstpage
    122
  • Lastpage
    126
  • Abstract
    The problem of variable selection from a large number of candidate predictors has recently been addressed in the machine learning of bioinformatics field. This is due to advances in high-throughput micro array techniques such as Affymetrix Gene Chips, and Illumina micro arrays that allow for studying thousands of genes in a single experiment. However, the resultant data from such genomic tools suffers from an p >> n problem, where the number of genes (p) to be examined is much larger than the number of samples (n). In such a model selection, the learning is considered hard, and the goal is to achieve accurate predictions from the inferred models alongside with their interpretability. Towards this goal, this work will experiment with feature selection methods and show how to improve the choice of the tuning parameter (s) in the lasso estimate feature selection method by adding an extra layer of filter feature selection methods to the lasso estimate path when learn from p n gene expression datasets. The results show that when the lasso estimate is ensemble with filter feature selection methods, the prediction accuracy for the chosen predictors for each targeted variable has improved.
  • Keywords
    bioinformatics; estimation theory; feature selection; genetics; genomics; learning (artificial intelligence); parameter estimation; P ≫ N gene expression datasets; bioinformatics field; candidate predictors; ensemble feature selection method; filter feature selection method; interpretability; lasso estimate feature selection method; lasso estimate regularization; machine learning; tuning parameter; variable selection problem; Accuracy; Bioinformatics; Correlation; Databases; Gene expression; Prostate cancer; Tuning; ?lter feature selection methods; Affymetrix GeneChips microarrays; Illumina microarrays; the lasso estimate;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications (ICMLA), 2013 12th International Conference on
  • Conference_Location
    Miami, FL
  • Type

    conf

  • DOI
    10.1109/ICMLA.2013.116
  • Filename
    6786093