DocumentCode
3271695
Title
TOP-MATA: A Max-First traversal method for top-K cosine similarity search
Author
Zhu, Shiwei ; Wu, Junjie ; Xia, Guoping ; Li, Limin
Author_Institution
Sch. of Econ. & Manage., Beihang Univ., Beijing, China
fYear
2010
fDate
28-30 June 2010
Firstpage
1
Lastpage
5
Abstract
Recent years have witnessed an increased interest in computing cosine similarities between documents (or commodities). Most previous studies require the specification of a minimum similarity threshold to perform cosine similarity search. However, it is usually difficult for users to provide an appropriate threshold in practice. Instead, in this paper, we propose to search top-K strongly related pairs of objects as measured by the cosine similarity. Specifically, we first define the cosine similarity measure from the association analysis point of view and identify the monotone property of an upper bound of the cosine measure, then exploit a Max-First traversal strategy for developing the TOP-MATA algorithm. Compared with previous TOP-DATA method, TOP-MATA has the advantage of saving the computations for false-positive item pairs. Finally, experimental results demonstrate the computational efficiency of the algorithm.
Keywords
data mining; document handling; search problems; TOP-MATA; documents; max-first traversal method; top-K cosine similarity search; Aircraft; Algorithm design and analysis; Association rules; Bioinformatics; Computational efficiency; Data mining; Databases; Pattern analysis; Sampling methods; Upper bound; Anti-Monotone Property; Association Analysis; Cosine Similarity; Interestingness Measure;
fLanguage
English
Publisher
ieee
Conference_Titel
Service Systems and Service Management (ICSSSM), 2010 7th International Conference on
Conference_Location
Tokyo
Print_ISBN
978-1-4244-6485-2
Type
conf
DOI
10.1109/ICSSSM.2010.5530100
Filename
5530100
Link To Document