• DocumentCode
    866808
  • Title

    A statistical method for estimating the usefulness of text databases

  • Author

    Liu, King-Lup ; Yu, Clement ; Meng, Weiyi ; Wu, Wensheng ; Rishe, Naphtali

  • Author_Institution
    Sch. of Comput. Sci., Telecommun. & Inf. Syst., DePaul Univ., Chicago, IL, USA
  • Volume
    14
  • Issue
    6
  • fYear
    2002
  • Firstpage
    1422
  • Lastpage
    1437
  • Abstract
    Searching desired data on the Internet is one of the most common ways the Internet is used. No single search engine is capable of searching all data on the Internet. The approach that provides an interface for invoking multiple search engines for each user query has the potential to satisfy more users. When the number of search engines under the interface is large, invoking all search engines for each query is often not cost effective because it creates unnecessary network traffic by sending the query to a large number of useless search engines and searching these useless search engines wastes local resources. The problem can be overcome if the usefulness of every search engine with respect to each query can be predicted. We present a statistical method to estimate the usefulness of a search engine for any given query. For a given query, the usefulness of a search engine in this paper is defined to be a combination of the number of documents in the search engine that are sufficiently similar to the query and the average similarity of these documents. Experimental results indicate that our estimation method is much more accurate than existing methods.
  • Keywords
    Internet; full-text databases; information resources; information retrieval; search engines; statistical analysis; Internet; cost effective; documents; experimental results; information resource discovery; information retrieval; metasearch; search engine; searching; statistical method; text databases; Costs; Databases; Helium; Information resources; Information retrieval; Internet; Metasearch; Search engines; Statistical analysis; Telecommunication traffic;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2002.1047777
  • Filename
    1047777