مرکز منطقه ای اطلاع رساني علوم و فناوري - Statistical and Biological Validation Methods in Cluster Analysis of Gene Expression

DocumentCode :

3060585

Title :

Statistical and Biological Validation Methods in Cluster Analysis of Gene Expression

Author :

Sunaga, Daniele Yumi ; Nievola, Julio Cesar ; Ramos, Milton Pires

Author_Institution :

Inst. de Biologia Molecular do Parana, Curitiba

fYear :

2007

fDate :

13-15 Dec. 2007

Firstpage :

494

Lastpage :

499

Abstract :

Data clustering methods have become standard techniques in the analysis of gene expression data. They are used in a variety of tasks ranging from simple data pre- treatment for posterior analysis to the identification of important information, such as gene function and/or the participation of a group of genes in a given biological process. Data clustering methods also offer advantages to the biologist from the economic point of view and given the time that would be necessary to obtain this type of information without the aid of intelligent computational methods. This work aims at guiding the choices in order to get the best possible solution from data clustering. To do so, algorithms from different approaches were used, i.e. k-means and SOM algorithms belong to the unidimentional approach and SAMBA algorithm, a bidimentional approach. Methods of statistical and biological validation were employed in order to choose the best data clustering solution. Results presented here demonstrated that the statistic validation methods were hardly in agreement with the biology validation method. Furthermore, some advantages of the SOM algorithm over the k-means algorithm were observed. Use of the bidimentional algorithm SAMBA revealed dataset structure not identified by the unidimentional algorithms. It was possible to aggregate meaningfull biological information to genes of unknown function. All the content of this work, including all the data clustering and detailed analysis are available at the URL http://www.ppgia.pucpr.br/~nievola/clusteranalysis.

Keywords :

biology computing; data structures; pattern clustering; statistical analysis; bidimentional algorithm; biological information; biological validation methods; data clustering methods; dataset structure; gene expression cluster analysis; gene expression data; k-means algorithm; statistical validation methods; Aggregates; Biological processes; Biology computing; Clustering algorithms; Clustering methods; Computational intelligence; Gene expression; Information analysis; Statistics; Uniform resource locators;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Machine Learning and Applications, 2007. ICMLA 2007. Sixth International Conference on

Conference_Location :

Cincinnati, OH

Print_ISBN :

978-0-7695-3069-7

Type :

conf

DOI :

10.1109/ICMLA.2007.55

Filename :

4457278

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3060585