Title of article :
K-mer-Based Motif Analysis in Insect Species across Anopheles, Drosophila, and Glossina Genera and Its Application to Species Classification
Author/Authors :
Cserhati, Matyas Department of Genetics - Cell Biology & Anatomy - University of Nebraska Medical Center - Omaha, USA , Xiao, Peng Department of Genetics - Cell Biology & Anatomy - University of Nebraska Medical Center - Omaha, USA , Guda, Chittibabu Department of Genetics - Cell Biology & Anatomy - University of Nebraska Medical Center - Omaha, USA
Abstract :
Short k-mer sequences from DNA are both conserved and diverged across species owing to their functional significance in
speciation, which enables their use in many species classification algorithms. In the present study, we developed a methodology to
analyze the DNA k-mers of whole genome, 5′ UTR, intron, and 3′ UTR regions from 58 insect species belonging to three genera of
Diptera that include Anopheles, Drosophila, and Glossina. We developed an improved algorithm to predict and score k-mers based
on a scheme that normalizes k-mer scores in different genomic subregions. (is algorithm takes advantage of the information
content of the whole genome as opposed to other algorithms or studies that analyze only a small group of genes. Our algorithm
uses k-mers of lengths 7–9 bp for the whole genome, 5′ and 3′ UTR regions as well as the intronic regions. Taxonomical relationships based on the whole-genome k-mer signatures showed that species of the three genera clustered together quite visibly.
We also improved the scoring and filtering of these k-mers for accurate species identification. (e whole-genome k-mer content
correlation algorithm showed that species within a single genus correlated tightly with each other as compared to other genera.
(e genomes of two Aedes and one Culex species were also analyzed to demonstrate how newly sequenced species can be classified
using the algorithm. Furthermore, working with several dozen species has enabled us to assign a whole-genome k-mer signature
for each of the 58 Dipteran species by making all-to-all pairwise comparison of the k-mer content. (ese signatures were used to
compare the similarity between species and to identify clusters of species displaying similar signatures.
Keywords :
K-mer-Based , DNA , Drosophila , Glossina
Journal title :
Computational and Mathematical Methods in Medicine