DocumentCode
2651791
Title
Similarity Calculation with Length Delimiting Dictionary Distance
Author
Burkovski, Andre ; Klenk, Sebastian ; Heidemann, Gunther
Author_Institution
Dept. for Intell. Syst., Univ. of Stuttgart, Stuttgart, Germany
fYear
2011
fDate
7-9 Nov. 2011
Firstpage
856
Lastpage
864
Abstract
The Normalized Compression Distance (NCD) has gained considerable interest in pattern recognition as a similarity measure applicable to unstructured data of very different domains, such as text, DNA sequences, or images. NCD uses existing compression programs such as gzip to compute similarity between objects. NCD has unique features: It does not require any prior knowledge, data preprocessing, feature extraction, domain adaptation or any parameter settings. Further, the NCD can be applied to symbolic data and raw signals alike. In this paper we decompose the NCD and introduce a method to measure compression-based similarity without the need to use compression. The Length Delimiting Dictionary Distance (LD3) takes the one component essential in compression methods, the dictionary generation, and strips the NCD of all dispensable components. The LD3 performs "compression based pattern recognition without compression", keeping all of the above benefits of the NCD while achieving better speed and recognition rates. We first review the NCD, introduce LD3 as the "essence" of NCD, and evaluate the LD3 based on language tree experiments, authorship recognition, and genome phylogeny data.
Keywords
data mining; dictionaries; pattern recognition; trees (mathematics); NCD; compression-based similarity; feature extraction; genome phylogeny data; language tree experiments; length delimiting dictionary distance; normalized compression distance; parameter-free data mining; pattern recognition; Complexity theory; Compression algorithms; Compressors; Dictionaries; Image coding; Measurement; Pattern recognition; dictionary-based compression; normalized compression distance; parameter-free data mining; pattern recognition; similarity metric;
fLanguage
English
Publisher
ieee
Conference_Titel
Tools with Artificial Intelligence (ICTAI), 2011 23rd IEEE International Conference on
Conference_Location
Boca Raton, FL
ISSN
1082-3409
Print_ISBN
978-1-4577-2068-0
Electronic_ISBN
1082-3409
Type
conf
DOI
10.1109/ICTAI.2011.133
Filename
6103424
Link To Document