DocumentCode
3663989
Title
Computing text similarity using Tree Edit Distance
Author
Grigori Sidorov;Helena Gómez-Adorno;Ilia Markov;David Pinto;Nahun Loya
Author_Institution
Center for Computing Research (CIC), Instituto Polité
fYear
2015
Firstpage
1
Lastpage
4
Abstract
In this paper, we propose the application of the Tree Edit Distance (TED) for calculation of similarity between syntactic n-grams for further detection of soft similarity between texts. The computation of text similarity is the basic task for many natural language processing problems, and it is an open research field. Syntactic n-grams are text features for Vector Space Model construction extracted from dependency trees. Soft similarity is application of Vector Space Model taking into account similarity of features. First, we discuss the advantages of the application of the TED to syntactic n-grams. Then, we present a procedure based on the TED and syntactic n-grams for calculating soft similarity between texts.
Keywords
"Syntactics","Natural language processing","Computational modeling","Heuristic algorithms","Information retrieval","Semantics","Cost function"
Publisher
ieee
Conference_Titel
Fuzzy Information Processing Society (NAFIPS) held jointly with 2015 5th World Conference on Soft Computing (WConSC), 2015 Annual Conference of the North American
Type
conf
DOI
10.1109/NAFIPS-WConSC.2015.7284129
Filename
7284129
Link To Document