Author :
Sherif, F.F. ; El Hefnawi, Mahmoud ; Kadah, Yasser
Author_Institution :
Biomed. Eng. Dept., Cairo Univ., Cairo, Egypt
Abstract :
Global outbreaks of human influenza arise from influenza A viruses with novel Hemagglutinin (HA) molecules to which humans have no immunity. So understanding of the origin and evolution of HA genes is of particular importance. Here, genomic signatures of the HA protein in different hosts was identified and associative classification for host-typing was conducted. We therefore conducted multiple-sequence alignment and detecting the most statistically significant differences between human, avian and swine group of sequences using VESPA, then applying class associative rule mining to identify amino acid´conserving positions that are specific to host species, called signatures. We applied strict thresholds to select only markers which are highly preserved in each influenza virus host isolates over time. Also, the two Sample sequence logo server was used to identify and confirm significant variations between the hosts. Host-specific signatures have created from scanning 1500 sequences of HA from human, swine and avian influenza A viruses. A total of 9, 31, 11, 6, 22, and 31 most informative positions of 560 amino acid residues yielded significant differences between Avian vs. Human, Human vs. Avian, Human vs. Swine, Swine vs. Human, Avian vs. Swine, and Swine vs. Avian respectively. Positions 438K, 458N and 286V were associated with avian, human and swine respectively, with support and confidence of (90.7% and 79.5%), (82.8% and 92.9%) and (51.4% and 98%) respectively. Host-specific class association rules aid in the prediction of prognostic biomarkers and improve the accuracy of prognosis.
Keywords :
bioinformatics; data mining; genomics; microorganisms; pattern classification; proteins; sequences; HA genes; VESPA; amino acid conserving positions; associative classification; associative rule mining; avian influenza A viruses; genomic signatures; hemagglutinin molecules; hemagglutinin protein; host-specific class association rules; host-specific signatures; host-typing; multiple-sequence alignment; prognostic biomarker prediction; sequence logo server; swine influenza A viruses; Bioinformatics; Genomics; Humans; Polymers; Influenza; signatures and Class association rules;