• DocumentCode
    3461538
  • Title

    Gene Selection for Microarray Expression Data with Imbalanced Sample Distributions

  • Author

    Kamal, A.H.M. ; Zhu, Xingquan ; Narayanan, Ramaswamy

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Florida Atlantic Univ., Boca Raton, FL, USA
  • fYear
    2009
  • fDate
    3-5 Aug. 2009
  • Firstpage
    3
  • Lastpage
    9
  • Abstract
    Microarray expression data, which contain expression levels of a large number of simultaneously observed genes, have been used in many scientific research and clinical studies. Due to its high dimensionalities, selecting a small number of genes has shown to be beneficial for tasks such as building prediction models for molecular classification of cancers. Traditional gene selection methods, however, fail to take the sample distributions into consideration for gene selection. Due to the scarcity of the samples, in Biomedical research it is very common to have severely biased data distributions with one class of examples (e.g., diseased samples) significantly less than other classes (e.g., normal samples). Sample sets with biased distributions require special attention for identifying genes responsible for particular disease. In this paper, we propose three filtering techniques, Higher Weight (HW), Differential Minority Repeat (DMR) and Balanced Minority Repeat (BMR), to identify genes relevant to fatal diseases for biased microarray expression data. Experimental comparisons with the traditional ReliefF method on five microarray datasets demonstrate the effectiveness of the proposed methods in selecting informative genes from microarray expression data with biased sample distributions.
  • Keywords
    cancer; filtering theory; genomics; statistical analysis; ReliefF method comparison; balanced minority repeat filtering technique; biased data distributions; cancer molecular classification; differential minority repeat filtering technique; gene expression levels; gene selection methods; higher weight filtering technique; imbalanced sample distributions; microarray expression data; prediction model building; Bioinformatics; Biology computing; Cancer; Diseases; Filters; Genomics; Humans; Neoplasms; Predictive models; Proteins;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics, Systems Biology and Intelligent Computing, 2009. IJCBS '09. International Joint Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-0-7695-3739-9
  • Type

    conf

  • DOI
    10.1109/IJCBS.2009.117
  • Filename
    5260766