Author/Authors :
Aixin Sun*، نويسنده , , Ee-Peng Lim، نويسنده ,
Abstract :
Homepages usually describe important semantic information
about conceptual or physical entities; hence,
they are the main targets for searching and browsing. To
facilitate semantic-based information retrieval (IR) at a
Web site, homepages can be identified and classified
under some predefined concepts and these concepts
are then used in query or browsing criteria, e.g., finding
professor homepages containing “information retrieval.”
In some Web sites, relationships may also exist
among homepages. These relationship instances (also
known as homepage relationships)enrich our knowledge
about theseWeb sites and allow more expressive semantic-
based IR. In this article, we investigate the features
to be used in mining homepage relationships.We systematically
develop different classes of inter-homepage features,
namely, navigation, relative-location, and common-
item features. We also propose deriving for each
homepage a set of support pages to obtain richer and
more complete content about the entity described by the
homepage. The homepage together with its support
pages are known to be a Web unit. By extracting interhomepage
features from Web units, our experiments on
the WebKB dataset show that better homepage relationship
mining accuracies can be achieved.