• DocumentCode
    175868
  • Title

    Cross domain web information extraction with multi-level feature model

  • Author

    Qian Chen ; Wenhao Zhu ; Chaoyou Ju ; Wu Zhang

  • Author_Institution
    Sch. of Comput. Eng. & Sci., Shanghai Univ., Shanghai, China
  • fYear
    2014
  • fDate
    19-21 Aug. 2014
  • Firstpage
    780
  • Lastpage
    784
  • Abstract
    One of the key problems of information extraction is to design a cross domain extraction procedure that can adapt different domain topics and text formats. However, most information extraction methods focus on specific areas or only have limited scalability for semi-structured texts. We argue that the problem of cross domain information extraction is basically introduced by domain related features. For example, the features used for price extraction in e-commerce websites cannot be directly applied in the case of extracting salary for recruiting websites. In worst case, a whole extraction model is required to be implemented despite the fact that there are common characters for price and salary. In this paper we propose a cross domain solution by dismantling domain relevant features into sub-features that are less domain related. The sub-features include composite features (those can be represented with a combination of several other features) and atomic features (features that can´t be dismantled). To manage the features effectively we propose a multi-level feature model by organizing the features as well as their relations. With this model, we give an information extraction method that can be quickly shifted when the extraction domain changes.
  • Keywords
    Internet; information retrieval; atomic features; composite features; cross domain Web information extraction; multilevel feature model; sub-features; Data mining; Electronic publishing; Encyclopedias; Feature extraction; Information retrieval; Internet; cross domain; information extraction; multi-level feature model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Computation (ICNC), 2014 10th International Conference on
  • Conference_Location
    Xiamen
  • Print_ISBN
    978-1-4799-5150-5
  • Type

    conf

  • DOI
    10.1109/ICNC.2014.6975936
  • Filename
    6975936