DocumentCode :
3060585
Title :
Statistical and Biological Validation Methods in Cluster Analysis of Gene Expression
Author :
Sunaga, Daniele Yumi ; Nievola, Julio Cesar ; Ramos, Milton Pires
Author_Institution :
Inst. de Biologia Molecular do Parana, Curitiba
fYear :
2007
fDate :
13-15 Dec. 2007
Firstpage :
494
Lastpage :
499
Abstract :
Data clustering methods have become standard techniques in the analysis of gene expression data. They are used in a variety of tasks ranging from simple data pre- treatment for posterior analysis to the identification of important information, such as gene function and/or the participation of a group of genes in a given biological process. Data clustering methods also offer advantages to the biologist from the economic point of view and given the time that would be necessary to obtain this type of information without the aid of intelligent computational methods. This work aims at guiding the choices in order to get the best possible solution from data clustering. To do so, algorithms from different approaches were used, i.e. k-means and SOM algorithms belong to the unidimentional approach and SAMBA algorithm, a bidimentional approach. Methods of statistical and biological validation were employed in order to choose the best data clustering solution. Results presented here demonstrated that the statistic validation methods were hardly in agreement with the biology validation method. Furthermore, some advantages of the SOM algorithm over the k-means algorithm were observed. Use of the bidimentional algorithm SAMBA revealed dataset structure not identified by the unidimentional algorithms. It was possible to aggregate meaningfull biological information to genes of unknown function. All the content of this work, including all the data clustering and detailed analysis are available at the URL http://www.ppgia.pucpr.br/~nievola/clusteranalysis.
Keywords :
biology computing; data structures; pattern clustering; statistical analysis; bidimentional algorithm; biological information; biological validation methods; data clustering methods; dataset structure; gene expression cluster analysis; gene expression data; k-means algorithm; statistical validation methods; Aggregates; Biological processes; Biology computing; Clustering algorithms; Clustering methods; Computational intelligence; Gene expression; Information analysis; Statistics; Uniform resource locators;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications, 2007. ICMLA 2007. Sixth International Conference on
Conference_Location :
Cincinnati, OH
Print_ISBN :
978-0-7695-3069-7
Type :
conf
DOI :
10.1109/ICMLA.2007.55
Filename :
4457278
Link To Document :
بازگشت