DocumentCode
375644
Title
Applying the branch and bound technique to document similarity search
Author
Furuse, Kazutaka ; Miura, Takayuki ; Ishikawa, Masahiro ; Chen, Hanxion ; Ohbo, Nobuo
Author_Institution
Inst. of Inf. Sci. & Electron., Univ. of Tsukuba, Japan
Volume
1
fYear
2001
fDate
2001
Firstpage
331
Abstract
This paper proposes a new mechanism for document similarity search, which uses the indexing structure called signature tables. The mechanism of signature tables is originally invented for similarity search of market basket data, and in this paper we apply it to document data. Since the characteristics of document data is definitely different from that of market basket data, the performance of similarity search is not satisfactory when the mechanism is naively applied to document data. In this paper, we describe the reason why the naive application decreases the efficiency, and propose some techniques for improving the performance. The results of simulation using real document data set show that the proposed mechanism implements good performance
Keywords
text analysis; tree searching; document similarity search; indexing structure; market basket data; signature tables; similarity search; Consumer electronics; Data mining; Indexing; Information science; Internet; Transaction databases; Web sites;
fLanguage
English
Publisher
ieee
Conference_Titel
Communications, Computers and signal Processing, 2001. PACRIM. 2001 IEEE Pacific Rim Conference on
Conference_Location
Victoria, BC
Print_ISBN
0-7803-7080-5
Type
conf
DOI
10.1109/PACRIM.2001.953590
Filename
953590
Link To Document