DocumentCode :
3321138
Title :
Deep Web Entity Identification Method Based on Improved Jaccard Coefficients
Author :
Wang, Yu ; Li, Ying-hua
Author_Institution :
Key Lab. in Machine Learning & Comput. Intell., Hebei Univ., Baoding, China
fYear :
2009
fDate :
28-29 Dec. 2009
Firstpage :
112
Lastpage :
115
Abstract :
There are a large number of accessible deep Web sites on the Internet. However, even if identical entity has different representation formats on different Web sites. So entity identification plays a crucial role in deep Web data mining. This paper proposes an entity identification method in the field of Chinese books. First, using improved Jaccard coefficients to calculate similarity of text attributes. Second, AHP (analytic hierarchy process) is used to obtain the weights, and using the sum of weights to calculate the entity similarity. Finally, it needs to integrate duplicate entity to achieve the entity identification. The experiment results demonstrate the approach has higher accuracy with good feasibility.
Keywords :
Internet; data mining; decision making; identification; statistical analysis; Chinese books; Internet; analytic hierarchy process; deep Web data mining; deep Web entity identification method; deep Web sites; improved Jaccard coefficients; text attributes; Books; Competitive intelligence; Computational intelligence; Computer science; Data mining; Internet; Learning systems; Machine learning; Support vector machines; Web pages; AHP; Deep web; Entity identification; Jaccard coefficients;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Research Challenges in Computer Science, 2009. ICRCCS '09. International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-0-7695-3927-0
Electronic_ISBN :
978-1-4244-5410-5
Type :
conf
DOI :
10.1109/ICRCCS.2009.36
Filename :
5401315
Link To Document :
بازگشت