DocumentCode :
1655230
Title :
Enrich Web Entity Schema Based on Integrated Annotation
Author :
Yan Zhang ; Qingzhong Li
Author_Institution :
Sch. of Comput. Sci. & Technol., Shandong Univ., Jinan, China
fYear :
2013
Firstpage :
153
Lastpage :
158
Abstract :
Web integration systems (WIS) need to collect web objects belong to a specific domain from different websites effectively. Most WIS defines entity schemas beforehand by domain experts. Due to the essence of diversity and variability of web, it is hard to model the web entity comprehensively beforehand, furthermore, wrong annotations happen when align object values from different websites into the WIS. In order to avoid the limitations, we propose an integrated annotating method combining the matching strategy and machine learning technology to dynamically discover synonyms for predefined attribute labels and new attribute labels for a specified type of web entity. Experimental results using real-world data in book and job domains show that the proposed approach is effective in enriching web entity schema to enhance the performance of data collection process in a WIS.
Keywords :
Internet; Web sites; data integration; learning (artificial intelligence); real-time systems; WIS; Web entity schema; Web integration systems; Websites; data collection process; domain experts; integrated annotating method; machine learning technology; matching strategy; real-world data; Information systems; conditional random fields; web entity; web entity annotation; web entity schema;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Information System and Application Conference (WISA), 2013 10th
Conference_Location :
Yangzhou
Print_ISBN :
978-1-4799-3218-4
Type :
conf
DOI :
10.1109/WISA.2013.37
Filename :
6778628
Link To Document :
بازگشت