Title :
Deep Web Entity Identification Method Based on Improved Jaccard Coefficients
Author :
Wang, Yu ; Li, Ying-hua
Author_Institution :
Key Lab. in Machine Learning & Comput. Intell., Hebei Univ., Baoding, China
Abstract :
There are a large number of accessible deep Web sites on the Internet. However, even if identical entity has different representation formats on different Web sites. So entity identification plays a crucial role in deep Web data mining. This paper proposes an entity identification method in the field of Chinese books. First, using improved Jaccard coefficients to calculate similarity of text attributes. Second, AHP (analytic hierarchy process) is used to obtain the weights, and using the sum of weights to calculate the entity similarity. Finally, it needs to integrate duplicate entity to achieve the entity identification. The experiment results demonstrate the approach has higher accuracy with good feasibility.
Keywords :
Internet; data mining; decision making; identification; statistical analysis; Chinese books; Internet; analytic hierarchy process; deep Web data mining; deep Web entity identification method; deep Web sites; improved Jaccard coefficients; text attributes; Books; Competitive intelligence; Computational intelligence; Computer science; Data mining; Internet; Learning systems; Machine learning; Support vector machines; Web pages; AHP; Deep web; Entity identification; Jaccard coefficients;
Conference_Titel :
Research Challenges in Computer Science, 2009. ICRCCS '09. International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-0-7695-3927-0
Electronic_ISBN :
978-1-4244-5410-5
DOI :
10.1109/ICRCCS.2009.36