• DocumentCode
    1696044
  • Title

    Removing non-informative blocks from the web pages

  • Author

    Gunasundari, R. ; Karthikeyan, S.

  • Author_Institution
    Karpagam Univ., Coimbatore, India
  • fYear
    2010
  • Firstpage
    810
  • Lastpage
    814
  • Abstract
    With the enormous growth on the web, users get easily lost in the rich hyper structure. Thus developing user friendly and automated tools for providing relevant information without any redundant links to the users to cater to their needs is the primary task for the website owners. But user is interested only in the informative contents and not in non-informative content blocks. Web pages often contain navigation sidebars, advertisements, search blocks, copyright notices, etc which are not content blocks. The information contained in these noncontent blocks can harm web mining. So it is important to separate the informative primary content blocks from noninformative blocks. In this paper are proposed three different algorithms for removing non-content blocks from the web pages. Removal of non-informative content blocks from web pages can achieve significant storage and time saving.
  • Keywords
    Web services; Web sites; content management; data mining; information retrieval; Web blocks; Web mining; Web pages; Website; informative contents; noisy blocks; non-informative content; Algorithm design and analysis; Data mining; Entropy; Feature extraction; HTML; Web pages; Web blocks; Web content mining; Web documents; noisy blocks;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communication Control and Computing Technologies (ICCCCT), 2010 IEEE International Conference on
  • Conference_Location
    Ramanathapuram
  • Print_ISBN
    978-1-4244-7769-2
  • Type

    conf

  • DOI
    10.1109/ICCCCT.2010.5670731
  • Filename
    5670731