• DocumentCode
    2958867
  • Title

    Finding the most similar documents across multiple text databases

  • Author

    Yu, Clement ; Liu, King-Lup ; Wu, Wensheng ; Meng, Weiyi ; Rishe, Naphtali

  • Author_Institution
    Dept. of Electr. Eng. & Comput. Sci., Illinois Univ., Chicago, IL, USA
  • fYear
    1999
  • fDate
    1999
  • Firstpage
    150
  • Lastpage
    162
  • Abstract
    We present a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, databases are ranked in a certain order. Next, documents are retrieved from the databases according to the order and in a particular way. If the databases containing the n most similar documents for a given query can be ranked ahead of other databases, the methodology will guarantee the retrieval of the n most similar documents for the query. A statistical method is provided to identify databases, each of which is estimated to contain at least one of the n most similar documents. Then, a number of strategies are presented to retrieve documents from the identified databases. Experimental results are given to illustrate the relative performance of different strategies
  • Keywords
    database management systems; information retrieval; search engines; text analysis; database ranking; document retrieval; most similar documents; multiple text databases; relative performance; statistical method; Australia; Computer networks; Database systems; ISDN; Indexing; Information retrieval; Information systems; Internet; Machine learning; Transaction databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Research and Technology Advances in Digital Libraries, 1999. Proceedings. IEEE Forum on
  • Conference_Location
    Baltimore, MD
  • ISSN
    1092-9959
  • Print_ISBN
    0-7695-0219-9
  • Type

    conf

  • DOI
    10.1109/ADL.1999.777710
  • Filename
    777710