DocumentCode
2460709
Title
A Comparison Study of Virus Classification by Genome Sequences
Author
Wang, Jing-doo
Author_Institution
Dept. of Comput. Sci. & Inf. Eng., Asia Univ., Taichung, Taiwan
fYear
2011
fDate
24-26 Oct. 2011
Firstpage
270
Lastpage
273
Abstract
In this study, instead of traditional approaches to virus classification, we proposed a novel approach in the vector space model for virus classification via two types of genome sequences, DNA and CDS. For DNA sequence, in this study, the k-mer approach was adopted for pattern extraction and the entropy of the pattern frequency distribution among classes was for pattern weighting. For CDS sequence, however, the pattern extraction was based on the identification of distinctive protein functions which were formed by CDS clustering and a weighting method, similar to tf * idf approach, for these protein functions was proposed. The experimental resources were download from NCBI and there were 35 classes (virus family) consisted of 1,877 viruses selected. The highest values of classification accuracy via SVM classifier were as high as 94.7% and 91.3% via DNA and CDS sequences, respectively. This study not only proposed a novel approach for virus classification but also provided a new methodology for comparative genomic analysis.
Keywords
DNA; biology computing; cellular biophysics; genomics; microorganisms; molecular biophysics; physiological models; proteins; support vector machines; CDS clustering; DNA sequence; SVM classifier; classification accuracy; comparative genomic analysis; genome sequences; k-mer approach; pattern extraction; pattern frequency distribution; pattern weighting; protein functions; vector space model; virus classification; Accuracy; Bioinformatics; DNA; Encoding; Genomics; Vectors; Viruses (medical); Comparative genomics; genome sequence; virus classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Bioinformatics and Bioengineering (BIBE), 2011 IEEE 11th International Conference on
Conference_Location
Taichung
Print_ISBN
978-1-61284-975-1
Type
conf
DOI
10.1109/BIBE.2011.47
Filename
6089838
Link To Document