DocumentCode :
1694245
Title :
Whole-Genome Phylogeny by Virtue of Unic Subwords
Author :
Comin, Matteo ; Verzotto, Davide
Author_Institution :
Dept. of Inf. Eng., Univ. of Padova, Padova, Italy
fYear :
2012
Firstpage :
190
Lastpage :
194
Abstract :
With the progress of modern sequencing technologies a number of complete genomes is now available. Traditional motif discovery tools cannot handle this massive amount of data, therefore the comparison of complete genomes can be carried out only with ad hoc methods. In this work we propose a distance function based on subword compositions, which extends the Average Common Subword approach(ACS) of Ulitsky et al. ACS is closely related to the cross entropy estimated between two entire genome sequences, and thus to some set of ``independent´´ subwords, namely their redundant common subwords. Then, we filter their redundant common subwords by means of underlying-paired motifs, which relate to each other regions of two genome sequences. This set of motifs is, by construction, linear in the size of input and without overlap; we call the selected motifs, underlying-paired irredundant common subwords, or simply unic subwords. Preliminary results show the validity of our method, and suggest novel computational approaches for analyzing the evolution of genomes.
Keywords :
bioinformatics; genetics; genomics; information filtering; sequences; text analysis; ACS; average common subword approach; common subwords; cross entropy; distance function; genome sequences; modern sequencing technologies; motif discovery tools; subword compositions; underlying-paired motifs; unic subwords; whole-genome phylogeny; Bioinformatics; Genomics; Phylogeny; USA Councils; Vegetation; Viruses (medical); Pattern discovery; whole genome comparison phylogeny;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database and Expert Systems Applications (DEXA), 2012 23rd International Workshop on
Conference_Location :
Vienna
ISSN :
1529-4188
Print_ISBN :
978-1-4673-2621-6
Type :
conf
DOI :
10.1109/DEXA.2012.10
Filename :
6327424
Link To Document :
بازگشت