Title :
Document comparison with a weighted topic hierarchy
Author :
Gelbukh, A. ; Sidorov, G. ; Guzmán-Arenas, A.
Author_Institution :
Nat. Language Lab., Nat. Polytech. Inst., Mexico City, Mexico
Abstract :
A method of document comparison based on a hierarchical dictionary of topics (concepts) is described. The hierarchical links in the dictionary are supplied with the weights that are used for detecting the main topics of a document and for determining the similarity between two documents. The method allows for the comparison of documents that do not share any words literally but do share concepts, including comparison of documents in different languages. Also, the method allows for comparison with respect to a specific “aspect”, i.e., a specific topic of interest (with its respective subtopics). A system classifier using the discussed method for document classification and information retrieval is discussed
Keywords :
information retrieval; visual databases; document classification; document comparison; hierarchical dictionary; information retrieval; system classifier; weighted topic hierarchy; Cities and towns; Dictionaries; Europe; Histograms; Information retrieval; Laboratories; Natural languages; Read only memory; Statistical analysis; Vocabulary;
Conference_Titel :
Database and Expert Systems Applications, 1999. Proceedings. Tenth International Workshop on
Conference_Location :
Florence
Print_ISBN :
0-7695-0281-4
DOI :
10.1109/DEXA.1999.795247