DocumentCode :
814794
Title :
Rough-Fuzzy C-Medoids Algorithm and Selection of Bio-Basis for Amino Acid Sequence Analysis
Author :
Maji, Pradipta ; Maji, Pradipta ; Pal, Sankar K. ; Pal, Sankar K.
Volume :
19
Issue :
6
fYear :
2007
fDate :
6/1/2007 12:00:00 AM
Firstpage :
859
Lastpage :
872
Abstract :
In most pattern recognition algorithms, amino acids cannot be used directly as inputs since they are nonnumerical variables. They, therefore, need encoding prior to input. In this regard, bio-basis function maps a nonnumerical sequence space to a numerical feature space. It is designed using an amino acid mutation matrix. One of the important issues for the bio-basis function is how to select the minimum set of bio-bases with maximum information. In this paper, we describe an algorithm, termed as rough-fuzzy c{hbox{-}}{rm{medoids}} (RFCMdd) algorithm, to select the most informative bio-bases. It is comprised of a judicious integration of the principles of rough sets, fuzzy sets, the c{hbox{-}}{rm{medoids}} algorithm, and the amino acid mutation matrix. While the membership function of fuzzy sets enables efficient handling of overlapping partitions, the concept of lower and upper bounds of rough sets deals with uncertainty, vagueness, and incompleteness in class definition. The concept of crisp lower bound and fuzzy boundary of a class, introduced in RFCMdd, enables efficient selection of the minimum set of the most informative bio-bases. Some new indices are introduced for evaluating quantitatively the quality of selected bio-bases. The effectiveness of the proposed algorithm, along with a comparison with other algorithms, has been demonstrated on different types of protein data sets.
Keywords :
Algorithm design and analysis; Amino acids; Encoding; Fuzzy sets; Genetic mutations; Partitioning algorithms; Pattern recognition; Rough sets; Uncertainty; Upper bound; Pattern recognition; bioinformatics.; c-medoids algorithm; data mining; fuzzy sets; rough sets;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2007.190609
Filename :
4161905
Link To Document :
بازگشت