Title of article :
A linguistic approach to classification of bacterial genomes
Author/Authors :
Volkovich، نويسنده , , Zeev and Kirzhner، نويسنده , , Valery and Barzily، نويسنده , , Zeev and Hosid، نويسنده , , Sergey and Korenblat، نويسنده , , Katerina، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2010
Pages :
11
From page :
1083
To page :
1093
Abstract :
In the present paper, 188 prokaryote genomes are classified by separately calculating the compositional spectra for the coding and the non-coding parts of the genomes. For each subsequence, the compositional spectrum is transformed into the corresponding point in a vector space. This enables the categorization of genomes into meaningful groups by a formal method. Repeated clustering performed for the coding and the non-coding genome parts makes it possible to estimate the true number of the genome clusters. The method we propose is based on a new application of external cluster validation indexes and on the misclassified quantities obtained in the process of repeated clustering. Besides, we have constructed additional data embedding into the appropriate Euclidean space only on the basis of the distances between compositional spectra. Biological evaluation of the results obtained for the 4-letter and the 2-letter alphabets substantiates the appropriateness of the resulting cluster-based classification.
Keywords :
Genome clustering , cluster validation , Compositional spectra method
Journal title :
PATTERN RECOGNITION
Serial Year :
2010
Journal title :
PATTERN RECOGNITION
Record number :
1733283
Link To Document :
بازگشت