Title :
A Random Walk Framework to Compute Textual Semantic Similarity: A Unified Model for Three Benchmark Tasks
Author :
Yazdani, Majid ; Popescu-Belis, Andrei
Author_Institution :
Idiap Res. Inst., EPFL, Lausanne, Switzerland
Abstract :
A network of concepts is built from Wikipedia documents using a random walk approach to compute distances between documents. Three algorithms for distance computation are considered: hitting/commute time, personalized page rank, and truncated visiting probability. In parallel, four types of weighted links in the document network are considered: actual hyperlinks, lexical similarity, common category membership, and common template use. The resulting network is used to solve three benchmark semantic tasks - word similarity, paraphrase detection between sentences, and document similarity - by mapping pairs of data to the network, and then computing a distance between these representations. The model reaches state-of-the-art performance on each task, showing that the constructed network is a general, valuable resource for semantic similarity judgments.
Keywords :
Web sites; probability; text analysis; Wikipedia documents; common category membership; common template use; commute time; distance computation; document network; document similarity; hitting time; hyperlinks; lexical similarity; paraphrase detection; personalized page rank; random walk framework; semantic similarity judgment; textual semantic similarity; truncated visiting probability; word similarity; Correlation; Electronic publishing; Encyclopedias; Humans; Internet; Semantics; Random Walk Algorithm; Textual Semantic Similarity; Wikipedia collaborative resource;
Conference_Titel :
Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on
Conference_Location :
Pittsburgh, PA
Print_ISBN :
978-1-4244-7912-2
Electronic_ISBN :
978-0-7695-4154-9
DOI :
10.1109/ICSC.2010.44