DocumentCode :
2198939
Title :
Unsupervised reduction of the dimensionality followed by supervised learning with a perceptron improves the classification of conditions in DNA microarray gene expression data
Author :
Conde, Lucia ; Mateos, Álvaro ; Herrero, Javier ; Dopazo, Joaquin
Author_Institution :
Bioinformatics Unit, Spanish Nat. Cancer Center, Madrid, Spain
fYear :
2002
fDate :
2002
Firstpage :
77
Lastpage :
86
Abstract :
This manuscript describes a combined approach of unsupervised clustering followed by supervised learning that provides an efficient classification of conditions in DNA array gene expression experiments (different cell lines including some cancer types, in the cases shown). Firstly the dimensionality of the dataset of gene expression profiles is reduced to a number of non-redundant clusters of co-expressing genes using an unsupervised clustering algorithm, the Self Organizing Tree Algorithm (SOTA), a hierarchical version of Self Organizing Maps (SOM). Then, the average values of these clusters are used for the training of a perception that produces a very efficient classification of the conditions. This way of reducing the dimensionality of the data set seems to perform better than other ones previously proposed such as PCA. In addition, the weights that connect the gene clusters to the different experimental conditions can be used to assess the relative importance of the genes in the definition of these classes. Finally, Gene Ontology (GO) terms are used to infer a possible biological role for these groups of genes and to asses the validity of the classification from a biological point of view.
Keywords :
DNA; arrays; biology computing; cancer; cellular biophysics; genetics; learning (artificial intelligence); perceptrons; self-organising feature maps; trees (mathematics); DNA microarray gene expression data; cancer types; conditions classification improvement; dataset dimensionality reduction; experimental conditions; gene expression profiles; gene ontology terms; genes relative importance; hierarchical self organizing maps; possible biological role; supervised learning with perceptron; unsupervised dimensionality reduction; Biotechnology; Cancer; Clustering algorithms; DNA; Gene expression; Neural networks; Principal component analysis; Supervised learning; Support vector machine classification; Support vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks for Signal Processing, 2002. Proceedings of the 2002 12th IEEE Workshop on
Print_ISBN :
0-7803-7616-1
Type :
conf
DOI :
10.1109/NNSP.2002.1030019
Filename :
1030019
Link To Document :
بازگشت