• DocumentCode
    1691937
  • Title

    A novel computational framework for fast distributed computing and knowledge integration for microarray gene expression data analysis

  • Author

    Sethi, Prerna ; Leangsuksun, Chokchai Box

  • Author_Institution
    Comput. Sci. Program, Louisiana Tech. Univ., Ruston, LA, USA
  • Volume
    2
  • fYear
    2006
  • Abstract
    Rapid technological advancements in microarray analysis continue to generate enormous amounts of genomic data. However, neither hardware nor software computational capabilities have kept pace with this drastic increase. This paper presents a novel framework designed to achieve fast, robust, and accurate (biologically-significant) multi-class classification of gene expression data using distributed knowledge discovery and computational integration routines, specifically for cancer applications. The proposed paradigm consists of the following key computational steps: (a) preprocessing normalization and discretization of gene expression data, (b) partition data using two methods: overlapped windows and adaptive selection, (c) perform association rule discovery on partitioned data-spaces using FP-growth method, (d) integrate derived association rules on distributed processor nodes using a novel knowledge integration algorithm, (e) further prune rules to reduce dimensionality using parametric significance estimation, and (f) cluster remaining rules using a novel clustering algorithm for enhanced visualization and interpretation of discovered gene rule sets.
  • Keywords
    biology computing; cancer; data analysis; data mining; genetics; pattern clustering; FP-growth method; cancer; distributed knowledge discovery; enhanced visualization; knowledge integration algorithm; microarray gene expression data analysis; multiclass classification; novel clustering algorithm; novel computational framework; overlapped window; parametric significance estimation; preprocessing normalization; Association rules; Bioinformatics; Biology computing; Clustering algorithms; Data analysis; Distributed computing; Gene expression; Genomics; Hardware; Partitioning algorithms;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Information Networking and Applications, 2006. AINA 2006. 20th International Conference on
  • ISSN
    1550-445X
  • Print_ISBN
    0-7695-2466-4
  • Type

    conf

  • DOI
    10.1109/AINA.2006.44
  • Filename
    1620447