• DocumentCode
    3271695
  • Title

    TOP-MATA: A Max-First traversal method for top-K cosine similarity search

  • Author

    Zhu, Shiwei ; Wu, Junjie ; Xia, Guoping ; Li, Limin

  • Author_Institution
    Sch. of Econ. & Manage., Beihang Univ., Beijing, China
  • fYear
    2010
  • fDate
    28-30 June 2010
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Recent years have witnessed an increased interest in computing cosine similarities between documents (or commodities). Most previous studies require the specification of a minimum similarity threshold to perform cosine similarity search. However, it is usually difficult for users to provide an appropriate threshold in practice. Instead, in this paper, we propose to search top-K strongly related pairs of objects as measured by the cosine similarity. Specifically, we first define the cosine similarity measure from the association analysis point of view and identify the monotone property of an upper bound of the cosine measure, then exploit a Max-First traversal strategy for developing the TOP-MATA algorithm. Compared with previous TOP-DATA method, TOP-MATA has the advantage of saving the computations for false-positive item pairs. Finally, experimental results demonstrate the computational efficiency of the algorithm.
  • Keywords
    data mining; document handling; search problems; TOP-MATA; documents; max-first traversal method; top-K cosine similarity search; Aircraft; Algorithm design and analysis; Association rules; Bioinformatics; Computational efficiency; Data mining; Databases; Pattern analysis; Sampling methods; Upper bound; Anti-Monotone Property; Association Analysis; Cosine Similarity; Interestingness Measure;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Service Systems and Service Management (ICSSSM), 2010 7th International Conference on
  • Conference_Location
    Tokyo
  • Print_ISBN
    978-1-4244-6485-2
  • Type

    conf

  • DOI
    10.1109/ICSSSM.2010.5530100
  • Filename
    5530100