Title :
Structural abstractions of hypertext documents for Web-based retrieval
Author :
Deogun, Jitender S. ; Sever, Hayri ; Ragh, Vijay V.
Author_Institution :
Dept. of Comput. Sci. & Eng., Nebraska Univ., Lincoln, NE, USA
Abstract :
There have been conflicting views in the literature on the capability of tools and mechanisms for storing and accessing information over Internet. On one hand it has been claimed for a long time that World Wide Web offers a chaotic environment for Web agents to extract information because the description of a document by HTML is easily comprehensible by humans, but is not so by machines. On the other hand, it has been hypothesized that information is sufficiently structured to facilitate effective Web mining, especially for electronic catalogs. In this article we do not intend to take position on this matter, but rather investigate the performance of a search engine while indexing more logical elements of HTML documents and while increasing the scope of indexing process
Keywords :
Internet; hypermedia; indexing; information retrieval; HTML; Internet; Web-based retrieval; World Wide Web; electronic catalogs; hypertext documents; structural abstractions; Chaos; Data mining; Electronic catalog; HTML; Humans; Indexing; Internet; Search engines; Web mining; Web sites;
Conference_Titel :
Database and Expert Systems Applications, 1998. Proceedings. Ninth International Workshop on
Conference_Location :
Vienna
Print_ISBN :
0-8186-8353-8
DOI :
10.1109/DEXA.1998.707429