Finding the most similar documents across multiple text databases

Author

Yu, Clement ; Liu, King-Lup ; Wu, Wensheng ; Meng, Weiyi ; Rishe, Naphtali

Author_Institution

Dept. of Electr. Eng. & Comput. Sci., Illinois Univ., Chicago, IL, USA

fYear

1999

fDate

1999

Firstpage

150

Lastpage

162

Abstract

We present a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, databases are ranked in a certain order. Next, documents are retrieved from the databases according to the order and in a particular way. If the databases containing the n most similar documents for a given query can be ranked ahead of other databases, the methodology will guarantee the retrieval of the n most similar documents for the query. A statistical method is provided to identify databases, each of which is estimated to contain at least one of the n most similar documents. Then, a number of strategies are presented to retrieve documents from the identified databases. Experimental results are given to illustrate the relative performance of different strategies

Keywords

database management systems; information retrieval; search engines; text analysis; database ranking; document retrieval; most similar documents; multiple text databases; relative performance; statistical method; Australia; Computer networks; Database systems; ISDN; Indexing; Information retrieval; Information systems; Internet; Machine learning; Transaction databases;

fLanguage

English

Publisher

ieee

Conference_Titel

Research and Technology Advances in Digital Libraries, 1999. Proceedings. IEEE Forum on

Conference_Location

Baltimore, MD

ISSN

1092-9959

Print_ISBN

0-7695-0219-9

Type

conf

DOI

10.1109/ADL.1999.777710

Filename

777710