DocumentCode
2861312
Title
PEWeb: Product Extraction from the Web Based on Entropy Estimation
Author
Phan, Xuan Hieu ; Horiguchi, Susumu ; Ho, Tu Bao
Author_Institution
Japan Advanced Institute of Information and Technology
fYear
2004
fDate
20-24 Sept. 2004
Firstpage
590
Lastpage
593
Abstract
Mining product descriptions (PDs) from e-commercial web sites is an important task in information extraction from the Web. In this paper, we propose an efficient technique for this task. The technique first discovers the set of PDs based on the measure of entropy at each internal node in the HTML tag tree. Afterwards, a set of association rules based on heuristic features is employed to filter the output and therefore enhance the precision. The experimental results of PEWeb system show that the proposed method outperforms existing automatic techniques remarkably.
Keywords
Association rules; Data mining; Entropy; Explosions; Filters; HTML; Information science; Manufacturing; Ontologies; Particle separators;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
Print_ISBN
0-7695-2100-2
Type
conf
DOI
10.1109/WI.2004.10102
Filename
1410874
Link To Document