• DocumentCode
    2949729
  • Title

    Designing and Implementing of the Webpage Information Extracting Model Based on Tags

  • Author

    Xu, Zhang ; Yan, Dong

  • Author_Institution
    Dept. of Inf., Peking Union Univ., Beijing, China
  • fYear
    2011
  • fDate
    20-21 Aug. 2011
  • Firstpage
    273
  • Lastpage
    275
  • Abstract
    In this article, a novel model of Webpage information extraction based on tags is presented. With the ingenious algorithm, the model preformed better than Html Parser and Jsoup in most cases. It can be a URL filter of the Net Crawler in order to enhance efficiency.
  • Keywords
    Web sites; information retrieval; search engines; URL filter; Web page information extracting model; net crawler; tags; Context; Data mining; HTML; Law; Search engines; Web pages; Html Parser; Html Tag; Jsoup; Webpage information extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligence Science and Information Engineering (ISIE), 2011 International Conference on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-4577-0960-9
  • Electronic_ISBN
    978-0-7695-4480-9
  • Type

    conf

  • DOI
    10.1109/ISIE.2011.71
  • Filename
    5997433