Title :
Alternate biological sequence clustering using symbol table
Author :
Baridam, Barilee
Author_Institution :
Dept. of Comput. Sci., Univ. of Port Harcourt, Port Harcourt, Nigeria
Abstract :
Clustering has to do with the identification of interesting distribution patterns and similarities, natural groupings or clusters, within a collection of objects in a dataset. Clustering is an unsupervised learning problem and can be distance-based or conceptual. In distance-based clustering the similarity criterion is based on distance. Objects belong to the same cluster if they are close according to a given distance. Conceptual clustering defines a concept common to all the objects in the cluster. In this case, objects are clustered based on their fitness to some descriptive concepts, and not according to distance or similarity measure. The extension of the usage of the common symbol table is employed in this paper to the clustering of biological sequences. The method does not depend on concept as does conceptual clustering rather it uses table (hash table or list). The result obtained indicates the usefulness of the symbol table in biological sequence clustering.
Keywords :
biology computing; pattern clustering; statistical distributions; symbol manipulation; unsupervised learning; biological sequence clustering; distance-based clustering; distribution patterns; natural groupings; similarity criterion; symbol table; unsupervised learning; Algorithm design and analysis; Bioinformatics; Biological information theory; Clustering algorithms; Data mining; Gene expression; Heuristic algorithms;
Conference_Titel :
Science and Information Conference (SAI), 2013
Conference_Location :
London