Title :
Clustering gene expression data using Shannon´s entropy
Author :
Mohanapriya, S. ; Elavarasi, S. Anitha ; Akilandeswari, J.
Author_Institution :
Dept. of Comput. Sci. & Eng., Sona Coll. of Technol., Salem, India
Abstract :
Clustering is a process of grouping a set of physical or abstract objects into classes of similar objects. The purpose of clustering gene expression data is to discover the natural data structures and gain some information regarding data distribution. It can be done with the help of clustering method. Hierarchical clustering groups´ data objects into a tree of clusters. Traditional clustering algorithms uses proximity measures to identify clusters with spherical shapes and is more sensitive in the presence of outliers. The most common proximity measures used are Euclidean distance, Manhattan distance, and Pearson correlation co-efficient. In this paper, Shannon´s entropy is used as a proximity measure. By using this entropy, we can able to capture the local structure of the input dataset regardless of their shapes and it is very less sensitive to outliers. It also helps to reduce the time complexity involved in identifying the gene clusters. The characteristics of the gene clusters which are produced as a result of this algorithm can be identified with the help of Gene Ontology (GO).
Keywords :
computational complexity; entropy; genetics; ontologies (artificial intelligence); pattern clustering; tree data structures; Euclidean distance; Manhattan distance; Pearson correlation coefficient; Shannon entropy; cluster tree; data distribution; gene clusters; gene expression data clustering; gene ontology; hierarchical clustering; natural data structures; time complexity; Clustering algorithms; Clustering methods; Data mining; Entropy; Gene expression; Ontologies; Shape; Gene Ontology; Gene expression data; Hierarchical Clustering; Shannon´s entropy;
Conference_Titel :
Recent Trends in Information Technology (ICRTIT), 2011 International Conference on
Conference_Location :
Chennai, Tamil Nadu
Print_ISBN :
978-1-4577-0588-5
DOI :
10.1109/ICRTIT.2011.5972412