• DocumentCode
    2781257
  • Title

    Chinese query expansion based on user log clustering

  • Author

    Jia, Shufang ; Li, Lei

  • Author_Institution
    Center for Intell. Sci. & Technol., Beijing Univ. of Posts & Telecommun., Beijing, China
  • fYear
    2009
  • fDate
    6-8 Nov. 2009
  • Firstpage
    446
  • Lastpage
    451
  • Abstract
    Most previous query expansion researches are based on pseudo relevant documents. In this study, we present a novel expansion method by clustering the real user log. Because not all of the clicked pages are suitable for query expansion, we de-noised the clicked results by reliability to enhance the performance. After HTML labels removing, the page body contents are clustered and the cluster centers cover various aspects of the original query. The terms used in log queries can provide a better choice of features, from the user´s point of view, for summarizing the Web pages that were clicked from these queries. Therefore, the associated queries, reverse queries, Webpage title and keyword phrases are combined with the cluster centers to attain high-quality expansion terms for new queries. We also propose a new terminology extraction method through Baidu Baike. It can identify and extract the terminology phrase based on the manual edited dictionary online.
  • Keywords
    Web sites; data mining; hypermedia markup languages; query processing; Baidu Baike; Chinese query expansion; HTML labels removal; Web page denoising; keyword phrases; manual edited online dictionary; page body contents; pseudo relevant documents; terminology phrase extraction; terminology phrase identification; user log clustering; Computer science; Data mining; Dictionaries; HTML; Information retrieval; Large scale integration; Noise reduction; Search engines; Terminology; Web pages; Baike terminology extraction; LSI clustering; Query expansion; log mining; webpage de-noising;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Network Infrastructure and Digital Content, 2009. IC-NIDC 2009. IEEE International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-4898-2
  • Electronic_ISBN
    978-1-4244-4900-6
  • Type

    conf

  • DOI
    10.1109/ICNIDC.2009.5360836
  • Filename
    5360836