DocumentCode :
2049225
Title :
An Axiomatic Approach to the Notion of Similarity of Individual Sequences and Their Classification
Author :
Ziv, Jacob
Author_Institution :
Dept. of Electr. Eng., Technion - Israel Inst. of Technol., Haifa, Israel
fYear :
2011
fDate :
21-24 June 2011
Firstpage :
3
Lastpage :
7
Abstract :
An axiomatic approach to the notion of similarity of sequences, that seems to be natural in many cases (e.g. Phylogenetic analysis), is proposed. Despite of the fact that it is not assume that the sequences are a realization of a probabilistic process (e.g. a variable-order Markov process), it is demonstrated that any classifier that fully complies with the proposed similarity axioms must be based on modeling of the training data that is contained in a (long) individual training sequence via a suffix tree with no more than O(N) leaves (or, alternatively, a table with O(N) entries) where N is the length of the test sequence. Some common classification algorithms may be slightly modified to comply with the proposed axiomatic conditions and the resulting organization of the training data, thus yielding a formal justification for their good empirical performance without relying on any a-priori (sometimes unjustified)probabilistic assumption. One such case is discussed in details.
Keywords :
biology computing; data compression; pattern classification; sequences; trees (mathematics); O(N) leaves; axiomatic approach; classification algorithms; formal justification; phylogenetic training data; probabilistic process; sequence similarity; suffix tree; test sequence; training sequence; Data models; Information theory; Markov processes; Phylogeny; Probabilistic logic; Training; Training data; phylogenetics; universal classification; universal data-compression;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression, Communications and Processing (CCP), 2011 First International Conference on
Conference_Location :
Palinuro
Print_ISBN :
978-1-4577-1458-0
Electronic_ISBN :
978-0-7695-4528-8
Type :
conf
DOI :
10.1109/CCP.2011.29
Filename :
6061021
Link To Document :
بازگشت