Title :
Using concept structures for efficient document comparison and location
Author :
Edmonds, Andrew N.
Author_Institution :
Scientio LLC, Woburn Sands
fDate :
March 1 2007-April 5 2007
Abstract :
A method is discussed for comparing and locating similar documents in a computationally efficient manner by making use of inferred concept statistics, rather than word frequencies. This novel technique uses natural language structures to create a short ´concept signature´ vector, which locates a document in ´concept space´. Similar documents can be located in large corpora in O(log(n)) time by making use of this space for indexing. Results from trials with reference and real world data sets are presented, along with a comparison of the method´s document similarity characteristics and the cosine metric
Keywords :
computational complexity; document handling; natural languages; statistical analysis; concept signature vector; concept statistics; concept structures; document comparison; document location; document similarity; natural language structures; Computational intelligence; Content management; Data mining; Humans; Indexing; Natural languages; Performance analysis; Performance evaluation; Statistics; Thesauri;
Conference_Titel :
Computational Intelligence and Data Mining, 2007. CIDM 2007. IEEE Symposium on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0705-2
DOI :
10.1109/CIDM.2007.368879