• DocumentCode
    2861312
  • Title

    PEWeb: Product Extraction from the Web Based on Entropy Estimation

  • Author

    Phan, Xuan Hieu ; Horiguchi, Susumu ; Ho, Tu Bao

  • Author_Institution
    Japan Advanced Institute of Information and Technology
  • fYear
    2004
  • fDate
    20-24 Sept. 2004
  • Firstpage
    590
  • Lastpage
    593
  • Abstract
    Mining product descriptions (PDs) from e-commercial web sites is an important task in information extraction from the Web. In this paper, we propose an efficient technique for this task. The technique first discovers the set of PDs based on the measure of entropy at each internal node in the HTML tag tree. Afterwards, a set of association rules based on heuristic features is employed to filter the output and therefore enhance the precision. The experimental results of PEWeb system show that the proposed method outperforms existing automatic techniques remarkably.
  • Keywords
    Association rules; Data mining; Entropy; Explosions; Filters; HTML; Information science; Manufacturing; Ontologies; Particle separators;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
  • Print_ISBN
    0-7695-2100-2
  • Type

    conf

  • DOI
    10.1109/WI.2004.10102
  • Filename
    1410874