• DocumentCode
    583027
  • Title

    Automatic Facet Extraction Based on Multidimensional Semantic Index

  • Author

    Wei, Xiao ; Luo, Xiangfeng ; Li, Qing

  • Author_Institution
    High Performance Comput. Center, Shanghai Univ., Shanghai, China
  • fYear
    2012
  • fDate
    22-24 Oct. 2012
  • Firstpage
    64
  • Lastpage
    71
  • Abstract
    Faceted search on web pages needs exact facets. However, it is difficult to extract facets exactly from web pages because the web pages are unstructured and lack of facet information. Therefore, facet extraction is a key to faceted search. This paper proposed a method of extracting facets automatically from unstructured web pages to improve the faceted search on web. The Multidimensional Semantic Index (MDSI) of web pages is constructed by mining all kinds of semantic relations among the words from web pages, which creates a semantic-rich index for web pages. In MDSI, the differently dimensional semantic indexes are bridged by mining the semantic mapping between them. Based on the MDSI of web pages, the facets are extracted by analyzing semantic mapping relations in MDSI. To validate the effect of the proposed method, two datasets are constructed and the experimental results show that the proposed method is feasible and comparatively precise.
  • Keywords
    Web sites; data mining; feature extraction; image retrieval; search problems; Web page MDSI; Web page mining; automatic facet extraction; facet information; facet search; multidimensional semantic index; semantic mapping; semantic-rich index; unstructured Web pages; Color; Communities; Dictionaries; Educational institutions; Google; Indexes; Semantics; facet extraction; faceted search; multidimensional semantic index; semantic mapping;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Semantics, Knowledge and Grids (SKG), 2012 Eighth International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4673-2561-5
  • Type

    conf

  • DOI
    10.1109/SKG.2012.22
  • Filename
    6391812