Abstract :
Viruses, viroids and prions are the smallest infectious biological entities that depend on their host for replication. The
number of pathogenic viruses is considerably large and their impact in human global health is well documented. Currently,the International Committee on the Taxonomy of Viruses (ICTV) has classified 4379 virus species while the NationalCenter for Biotechnology Information Viral Genomes Resource (NCBI-VGR) database has mapped 617 705 proteinsto eight large taxonomic groups. Despite these efforts, an automated approach for mapping the ICTV master listand its officially accepted virus naming to the NCBI-VGR’s taxonomical classification is not available. Due to metagenomicsequencing, it is likely that the discovery and naming of new viral species will increase by at least ten fold. Unfortunately,existing viral databases are not adequately prepared to scale, maintain and annotate automatically ultra-high throughputsequences and place this information into specific taxonomic categories. ORION-VIRCAT is a scalable and interoperableobject-relational database designed to serve as a resource for the integration and verification of taxonomical classifications generated by the ICTV and NCBI-VGR. The current release (v1.0) of ORION-VIRCAT is implemented in PostgreSQL and it has been extended to ORACLE, MySQL and SyBase. ORION-VIRCAT automatically mapped and joined 617 705 entries from the NCBI-VGR to the viral naming of the ICTV. This detailed analysis revealed that 399 095 entries from the NCBI-VGR can be mapped to the ICTV classification and that one Order, 10 families, 35 genera and 503 species listed in the ICTV disagree with the the NCBI-VGR classification schema. Nevertheless, we were eable to correct several discrepancies mapping 234 000 additional entries.