DocumentCode
2190625
Title
Diffusion Maps: A Superior Semantic Method to Improve Similarity Join Performance
Author
Hawashin, Bilal ; Fotouhi, Farshad ; Grosky, William
Author_Institution
Dept. of Comput. Sci., Wayne State Univ., Detroit, MI, USA
fYear
2010
fDate
13-13 Dec. 2010
Firstpage
9
Lastpage
16
Abstract
This paper adopts the use of the diffusion maps method for joining long string values, such as paper abstracts, movie summaries, product descriptions, and user feedback, to improve the performance of the existing similarity join methods. In this work, we showed that using attributes of long string values to detect similar records would significantly improve the overall similarity join performance. Most databases include attributes of long string values, and the existing similarity join methods are not efficient in finding the similarity among the values of these long attributes. In this paper, multiple methods were compared according to their ability in joining long string values semantically.
Keywords
eigenvalues and eigenfunctions; learning (artificial intelligence); records management; string matching; SoftTfldf; cosine similarity; diffusion maps; eigenvectors; latent semantic indexing; semantic method; similar records; similarity join methods; similarity join performance; string values; SoftTfIdf; cosine similarity; diffusion maps; latent semantic indexing; long attributes; semantic similarity join method; supervised learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
Conference_Location
Sydney, NSW
Print_ISBN
978-1-4244-9244-2
Electronic_ISBN
978-0-7695-4257-7
Type
conf
DOI
10.1109/ICDMW.2010.77
Filename
5693276
Link To Document