DocumentCode :
2461680
Title :
A Novel Biclustering Algorithm for Discovering Value-Coherent Overlapping σ-Biclusters
Author :
Das, Chandra ; Maji, Pradipta ; Chattopadhyay, Samiran
Author_Institution :
Dept. of Comput. Sci. & Eng., Netaji Subhash Eng. Coll., Kolkata
fYear :
2008
fDate :
14-17 Dec. 2008
Firstpage :
148
Lastpage :
156
Abstract :
The biclustering method is a very useful tool for analyzing gene expression data when some genes have multiple functions and experimental conditions are diverse in gene expression measurement. It focuses on finding a subset of genes and a subset of experimental conditions that together exhibit coherent behavior. A large number of biclustering algorithms has been developed for analyzing gene expression data. Most of them find exclusive biclusters, which is inappropriate in the biological context. Since biological processes are not independent of each other, many genes participate in multiple different processes. Hence, nonexclusive biclustering algorithms are required for finding highly overlapping biclusters. In this regard, a novel overlapping biclustering algorithm is presented here to find overlapping biclusters of larger volume with mean squared residue lower than a given threshold. The proposed method consists of two phases. First, a set of highly coherent seeds is generated based on two-way k-medoids algorithm, where mutual information is used as a similarity measure instead of using Euclidean distance. The seeds are then iteratively adjusted (enlarged or degenerated) by adding or removing genes and conditions based on a new quantitative index. In effect, the proposed method provides highly overlapping coherent biclusters with mean squared residue lower than a given threshold. Some quantitative indices are introduced for evaluating the quality of generated biclusters. The quality of biclusters found by the proposed approach is discussed and the results are compared to those reported by existing methods. In general, the proposed approach shows an excellent performance at finding patterns in gene expression data.
Keywords :
bioinformatics; data analysis; data mining; genetics; pattern clustering; Euclidean distance; biclustering algorithm; biological process; gene expression data analysis; gene expression measurement; mean squared residue; quantitative index; similarity measure; two-way k-medoids algorithm; value-coherent overlapping delta-bicluster discovery; Algorithm design and analysis; Biological processes; Computer science; Data analysis; Data engineering; Educational institutions; Gene expression; Information analysis; Information technology; Machine intelligence;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Computing and Communications, 2008. ADCOM 2008. 16th International Conference on
Conference_Location :
Chennai
Print_ISBN :
978-1-4244-2962-2
Electronic_ISBN :
978-1-4244-2963-9
Type :
conf
DOI :
10.1109/ADCOM.2008.4760441
Filename :
4760441
Link To Document :
بازگشت