Title of article :
Using alignment-free methods as preprocessing stage to classification whole genomes
Author/Authors :
Abed Alhadi Shanan, Najah Computer Department - Science College for Women - University of Babylon, Babylon, Iraq , Attya Lafta, Hussein Computer Department - Science College for Women - University of Babylon, Babylon, Iraq , Alrashid, Sura Z. College of Information Technology - University of Babylon, Babylon, Iraq
Abstract :
In bioinformatics systems, the study of genetics is a popular research discipline. These systems
depend on the amount of similarity between the biological data. These data are based on DNA
sequences or raw sequencing reads. In the preprocessing stage, there are several methods for mea-
suring similarity between sequences. The most popular of these methods is the alignment method
and alignment-free method, which are applied to determine the amount of functional matching be-
tween sequences of nucleotides DNA, ribosome RNA, or proteins. Alignment-based methods pose
a great challenge in terms of computational complexity, In addition to delaying the time to search
for a match, especially if the data is heterogeneous and its size is huge, and thus the classification
accuracy decreases in the post-processing stage. Alignment-free methods have overcome the chal-
lenges of alignment-based methods for measuring the distance between sequences, The size of the
data used is 1000 genomes uploaded from National Center for Biotechnology Information (NCBI),
after eliminating the missing and irrelevant values, it becomes 860 genomes, ready to be segmented
into words by the k-mer analysis, after which the frequency of each word is counted for each query.
The size of a word depends on a value of k. In this paper we used a value of k =3 . . . .8, for each
iteration will count times of frequencies words.
Keywords :
16S RNA , DNA , k-mers
Journal title :
International Journal of Nonlinear Analysis and Applications