DocumentCode :
2036987
Title :
Text retrieval by using k-word proximity search
Author :
Sadakane, Kunihika ; Imai, Hiroshi
Author_Institution :
Dept. of Inf. Sci., Tokyo Univ., Japan
fYear :
1999
fDate :
1999
Firstpage :
183
Lastpage :
188
Abstract :
When we search from a huge amount of documents, we often specify several keywords and use conjunctive queries to narrow the result of the search. Though the searched documents contain all keywords, positions of the keywords are usually not considered. As the result, the search result contains some meaningless documents. It is therefore effective to rank documents according to proximity of keywords in the documents. This ranking is regarded as a kind of text data mining. We propose two algorithms for finding documents in which all given keywords appear in neighboring places. One is based on the plane-sweep algorithm and the other is based on a divide-and-conquer approach. Both algorithms run in O(n log n) time where n is the number of occurrences of given keywords. We run the plane-sweep algorithm on a large collection of HTML files and verify its effectiveness
Keywords :
data mining; divide and conquer methods; full-text databases; hypermedia markup languages; information retrieval; HTML; conjunctive queries; divide-and-conquer approach; document searching; k-word proximity search; keywords; plane-sweep algorithm; rank documents; text data mining; text retrieval; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database Applications in Non-Traditional Environments, 1999. (DANTE '99) Proceedings. 1999 International Symposium on
Conference_Location :
Kyoto
Print_ISBN :
0-7695-0496-5
Type :
conf
DOI :
10.1109/DANTE.1999.844958
Filename :
844958
Link To Document :
بازگشت