Title :
Efficiently mining gene expression data via a novel parameterless clustering method
Author :
Tseng, Vincent S. ; Kao, Ching-Pin
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
Abstract :
Clustering analysis has been an important research topic in the machine learning field due to the wide applications. In recent years, it has even become a valuable and useful tool for in-silico analysis of microarray or gene expression data. Although a number of clustering methods have been proposed, they are confronted with difficulties in meeting the requirements of automation, high quality, and high efficiency at the same time. In this paper, we propose a novel, parameterless and efficient clustering algorithm, namely, correlation search technique (CST), which fits for analysis of gene expression data. The unique feature of CST is it incorporates the validation techniques into the clustering process so that high quality clustering results can be produced on the fly. Through experimental evaluation, CST is shown to outperform other clustering methods greatly in terms of clustering quality, efficiency, and automation on both of synthetic and real data sets.
Keywords :
biology computing; data mining; genetics; learning (artificial intelligence); molecular biophysics; statistical analysis; correlation search technique; data mining; gene expression; machine learning; parameterless clustering method; Algorithm design and analysis; Automation; Clustering algorithms; Clustering methods; Data mining; Fungi; Gene expression; Machine learning; Partitioning algorithms; Psychology; Machine learning; clustering; data mining; mining methods and algorithms.; Algorithms; Cell Cycle; Cluster Analysis; Computational Biology; Gene Expression Profiling; Models, Genetic; Models, Statistical; Oligonucleotide Array Sequence Analysis; ROC Curve; Sequence Analysis, DNA; Software; Yeasts;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2005.56