DocumentCode
660888
Title
Fast Information Retrieval and Social Network Mining via Cosine Similarity Upper Bound
Author
Weizhong Zhao ; Martha, V.S. ; Gang Chen ; Xiaowei Xu
Author_Institution
Coll. of Inf. Eng., Xiangtan Univ., Xiangtan, China
fYear
2013
fDate
8-14 Sept. 2013
Firstpage
940
Lastpage
943
Abstract
Similarity search is a key function for many applications including databases, pattern recognition and recommendation systems to name a few. In this paper, we first propose ε-query, a similarity search based on the popular cosine similarity for information retrieval and social network analysis. In contrast to traditional similarity search ε-query returns results whose cosine similarities with the query are larger than a threshold ε. The major contribution of this paper is an efficient ε-query processing algorithm by using an upper bound for binary data. Our evaluation using two of the largest publicly available real datasets, ClueWeb09 and Twitter, demonstrated that the proposed method could achieve several orders of magnitude speedup in comparison with the traditional approach. Last but not least, we applied the proposed method for information retrieval from ClueWeb and finding community structures from Twitter. The outcome further proved the effectiveness of the proposed method.
Keywords
data mining; query processing; social networking (online); ε-query processing algorithm; ClueWeb09; Twitter; cosine similarity upper bound; databases; fast information retrieval; pattern recognition; recommendation systems; similarity search; social network mining; Communities; Complexity theory; Image edge detection; Twitter; Upper bound;
fLanguage
English
Publisher
ieee
Conference_Titel
Social Computing (SocialCom), 2013 International Conference on
Conference_Location
Alexandria, VA
Type
conf
DOI
10.1109/SocialCom.2013.147
Filename
6693444
Link To Document