• DocumentCode
    1906150
  • Title

    A New Vision-Based Method for Extracting Academic Information from Conference Web Pages

  • Author

    Peng Wang ; Mingqi Zhou ; Yue You ; Xiang Zhang

  • Author_Institution
    Sch. of Comput. Sci. & Eng., Southeast Univ., Nanjing, China
  • Volume
    1
  • fYear
    2012
  • fDate
    7-9 Nov. 2012
  • Firstpage
    976
  • Lastpage
    981
  • Abstract
    This paper proposes a new vision-based method for extracting academic information from conference Web pages. The main contributions include: (1) An new vision-based page segmentation algorithm is proposed to improve the result of classical VIPS algorithm. This algorithm can divide pages into text blocks. (2) All text blocks are classified as 10 categories according to vision features, keyword features and text content features. The initial classification results have 75% precision and 67% recall. (3) The context information of text blocks are employed to repair and refine initial classification results, which are improved to 96% precision and 98% recall. Finally, academic information is extracted from classified text blocks. Our experimental results on real-world datasets show that the proposed method is effective and efficient for extracting academic information from conference Web pages.
  • Keywords
    Web sites; educational administrative data processing; information retrieval; text analysis; academic information extraction; classical VIPS algorithm; conference Web pages; keyword features; text blocks; text content features; vision-based method; vision-based page segmentation algorithm; Classification algorithms; Data mining; Feature extraction; Noise; Semantics; Web pages; Web information extraction; Web page segmentation; bayesian network classifier;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence (ICTAI), 2012 IEEE 24th International Conference on
  • Conference_Location
    Athens
  • ISSN
    1082-3409
  • Print_ISBN
    978-1-4799-0227-9
  • Type

    conf

  • DOI
    10.1109/ICTAI.2012.138
  • Filename
    6495152