Title of article
Main Content Extraction from Detailed Web Pages
Author/Authors
Mohsen Asfia، نويسنده , , Mir Mohsen Pedram، نويسنده , , Amir Masoud Rahmani، نويسنده ,
Issue Information
روزنامه با شماره پیاپی سال 2010
Pages
4
From page
18
To page
21
Abstract
As we know internet detailed web pages contains information which are not considered as primary content such as advertisements, headers, footers, navigation links and copyright information. Also information on web pages such as comments and reviews are not preferred by search engines to index as informative content, thereby having an algorithm to extracts only main content could help better quality on web page indexing. Almost all algorithms have been proposed are tag dependent means they could only look for primary content among specific tags such as or