Title :
Mining the URLs: An Approach to Measure the Similarities between Named-Entities
Author :
Liu, Hui ; Zhao, Jinglei ; Lu, Ruzhan
Author_Institution :
Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai
Abstract :
Measuring the similarity between named-entities is a foundation work for a number of practical applications, such as information extraction, query expansion, etc. In this paper the authors study the similarity measure between two named-entities. Especially, the authors are interested in fine-grained similarity differences between named-entities in one class, such as "novelist". Different from previous works on named-entity associations, this paper suggests a novel Web mining method that solely depends on the URLs returned by a search engine using named-entities as queries. The problem of similarity between two namedentities is converted to that of similarity of two URL sets. Evaluations show that this method achieves good results under two experiments.
Keywords :
Internet; data mining; query processing; search engines; URL; Web mining method; information extraction; named-entities; query expansion; search engine; similarity measure; Application software; Computer science; Data mining; Information analysis; Natural language processing; Pattern analysis; Search engines; Taxonomy; Uniform resource locators; Web mining;
Conference_Titel :
Innovative Computing Information and Control, 2008. ICICIC '08. 3rd International Conference on
Conference_Location :
Dalian, Liaoning
Print_ISBN :
978-0-7695-3161-8
Electronic_ISBN :
978-0-7695-3161-8
DOI :
10.1109/ICICIC.2008.362