Title :
Vector model improvement using suffix trees
Author :
Jan Martinovic;Tomas Novosad;Vaclav Snasel
Author_Institution :
Faculty of Electrical Engineering and Computer Science, V?B - Technical University of Ostrava, The Czech Republic
Abstract :
There are many ways how to search for documents in document collections. These methods take advantage of Boolean, vector, probabilistic and other models for representation of documents, queries, rules and procedures which can determine correspondence between user requests and documents. Each of these models have several restrictions. These restrictions do not allow a user to find all relevant documents. There are many irrelevant documents among returned ones by the system and some relevant documents missing at all. In the article there is a new method suggested which uses suffix trees for the vector query improvement. This method treats with documents as a, set of phrases (sentences) not just as a set of words. The sentence has a specific, semantic meaning (words in the sentence are ordered). This is advantage in comparison with the treated document just like with, a bag of words.
Keywords :
"Computer science","Information retrieval","Clustering methods","Couplings","Indexing","Internet"
Conference_Titel :
Digital Information Management, 2007. ICDIM ´07. 2nd International Conference on
Print_ISBN :
978-1-4244-1475-8
DOI :
10.1109/ICDIM.2007.4444220