• DocumentCode
    1655230
  • Title

    Enrich Web Entity Schema Based on Integrated Annotation

  • Author

    Yan Zhang ; Qingzhong Li

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Shandong Univ., Jinan, China
  • fYear
    2013
  • Firstpage
    153
  • Lastpage
    158
  • Abstract
    Web integration systems (WIS) need to collect web objects belong to a specific domain from different websites effectively. Most WIS defines entity schemas beforehand by domain experts. Due to the essence of diversity and variability of web, it is hard to model the web entity comprehensively beforehand, furthermore, wrong annotations happen when align object values from different websites into the WIS. In order to avoid the limitations, we propose an integrated annotating method combining the matching strategy and machine learning technology to dynamically discover synonyms for predefined attribute labels and new attribute labels for a specified type of web entity. Experimental results using real-world data in book and job domains show that the proposed approach is effective in enriching web entity schema to enhance the performance of data collection process in a WIS.
  • Keywords
    Internet; Web sites; data integration; learning (artificial intelligence); real-time systems; WIS; Web entity schema; Web integration systems; Websites; data collection process; domain experts; integrated annotating method; machine learning technology; matching strategy; real-world data; Information systems; conditional random fields; web entity; web entity annotation; web entity schema;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Information System and Application Conference (WISA), 2013 10th
  • Conference_Location
    Yangzhou
  • Print_ISBN
    978-1-4799-3218-4
  • Type

    conf

  • DOI
    10.1109/WISA.2013.37
  • Filename
    6778628