• DocumentCode
    3122212
  • Title

    Effective XML Keyword Search with Relevance Oriented Ranking

  • Author

    Bao, Zhifeng ; Ling, Tok Wang ; Chen, Bo ; Lu, Jiaheng

  • Author_Institution
    Sch. of Comput., Nat. Univ. of Singapore, Singapore
  • fYear
    2009
  • fDate
    March 29 2009-April 2 2009
  • Firstpage
    517
  • Lastpage
    528
  • Abstract
    Inspired by the great success of information retrieval (IR) style keyword search on the Web, keyword search on XML has emerged recently. The difference between text database and XML database results in three new challenges: (1) Identify the user search intention, i.e. identify the XML node types that user wants to search for and search via. (2) Resolve keyword ambiguity problems: a keyword can appear as both a tag name and a text value of some node; a keyword can appear as the text values of different XML node types and carry different meanings. (3) As the search results are sub-trees of the XML document, new scoring function is needed to estimate its relevance to a given query. However, existing methods cannot resolve these challenges, thus return low result quality in term of query relevance. In this paper, we propose an IR-style approach which basically utilizes the statistics of underlying XML data to address these challenges. We first propose specific guidelines that a search engine should meet in both search intention identification and relevance oriented ranking for search results. Then based on these guidelines, we design novel formulae to identify the search for nodes and search via nodes of a query, and present a novel XML TF*IDF ranking strategy to rank the individual matches of all possible search intentions. Lastly, the proposed techniques are implemented in an XML keyword search engine called XReal, and extensive experiments show the effectiveness of our approach.
  • Keywords
    XML; database management systems; search engines; XML database; XML document; XReal; effective XML keyword search engine; information retrieval style keyword search; keyword ambiguity problems; relevance oriented ranking; scoring function; search intention identification; text database; user search intention; Data engineering; Databases; Guidelines; Information retrieval; Keyword search; Optical computing; Search engines; Statistics; Web search; XML; XML; keyword; ranking; search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on
  • Conference_Location
    Shanghai
  • ISSN
    1084-4627
  • Print_ISBN
    978-1-4244-3422-0
  • Electronic_ISBN
    1084-4627
  • Type

    conf

  • DOI
    10.1109/ICDE.2009.16
  • Filename
    4812431