DocumentCode
470027
Title
Discovering interchangeable words from string databases
Author
Alvarez, Marco A. ; Lim, SeungJin
Author_Institution
Dept. of Comput. Sci., Utah State Univ., Logan, UT
Volume
1
fYear
2007
fDate
28-31 Oct. 2007
Firstpage
25
Lastpage
30
Abstract
This paper presents a solution for the problem of finding interchangeable words in the context of an input collection of strings. Interchangeable words are words that can be replaced indistinctly in phrases or free text without deviating its actual meaning. Under restricted conditions, pairs of interchangeable might be useful for data deduplication, copy detection, software localization, among others. The calculation of the degree of interchangeability involves the accurate calculation of semantic similarity between pairs of words and the search for candidate pairs in the overall search space imposed by the input collection. The solution presented in this paper is composed by a search method for candidate pairs using the Levenshtein distance algorithm and a novel algorithm - SSA -for calculating the semantic similarity between words. The proposed solution was implemented and tested within a real world application related to a string message database from a software development company. The system was used to build an ontology with clusters of interchangeable words.
Keywords
database management systems; word processing; Levenshtein distance algorithm; copy detection; data deduplication; interchangeable words; semantic similarity; software localization; string databases; string message database; Application software; Clustering algorithms; Computer science; Databases; Educational institutions; Marine animals; Ontologies; Programming; Search methods; Software testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Digital Information Management, 2007. ICDIM '07. 2nd International Conference on
Conference_Location
Lyon
Print_ISBN
978-1-4244-1475-8
Electronic_ISBN
978-1-4244-1476-5
Type
conf
DOI
10.1109/ICDIM.2007.4444195
Filename
4444195
Link To Document