DocumentCode
2958867
Title
Finding the most similar documents across multiple text databases
Author
Yu, Clement ; Liu, King-Lup ; Wu, Wensheng ; Meng, Weiyi ; Rishe, Naphtali
Author_Institution
Dept. of Electr. Eng. & Comput. Sci., Illinois Univ., Chicago, IL, USA
fYear
1999
fDate
1999
Firstpage
150
Lastpage
162
Abstract
We present a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, databases are ranked in a certain order. Next, documents are retrieved from the databases according to the order and in a particular way. If the databases containing the n most similar documents for a given query can be ranked ahead of other databases, the methodology will guarantee the retrieval of the n most similar documents for the query. A statistical method is provided to identify databases, each of which is estimated to contain at least one of the n most similar documents. Then, a number of strategies are presented to retrieve documents from the identified databases. Experimental results are given to illustrate the relative performance of different strategies
Keywords
database management systems; information retrieval; search engines; text analysis; database ranking; document retrieval; most similar documents; multiple text databases; relative performance; statistical method; Australia; Computer networks; Database systems; ISDN; Indexing; Information retrieval; Information systems; Internet; Machine learning; Transaction databases;
fLanguage
English
Publisher
ieee
Conference_Titel
Research and Technology Advances in Digital Libraries, 1999. Proceedings. IEEE Forum on
Conference_Location
Baltimore, MD
ISSN
1092-9959
Print_ISBN
0-7695-0219-9
Type
conf
DOI
10.1109/ADL.1999.777710
Filename
777710
Link To Document