DocumentCode
3374494
Title
A new study on using HTML structures to improve retrieval
Author
Cutler, M. ; Deng, H. ; Maniccam, S.S. ; Meng, W.
Author_Institution
Dept. of Comput. Sci., State Univ. of New York, Binghamton, NY, USA
fYear
1999
fDate
1999
Firstpage
406
Lastpage
409
Abstract
Locating useful information effectively form the World Wide Web (WWW) is of wide interest. This paper presents new results on a methodology of using the structures and hyperlinks of HTML documents to improve the effectiveness of retrieving HTML documents. This methodology partitions the occurrences of terms in a document collection into classes according to the tags in which a particular term appears (such as Title, H1-H6, and Anchor). The rationale is that terms appearing in different structures of a document may have different significance in identifying the document. The weighting schemes of traditional information retrieval were extended to include class importance values. We implemented a genetic algorithm to determine a “best so far” class importance factor combination. Our experiments indicate that using this technique the retrieval effectiveness can be improved by 39.6% or higher
Keywords
genetic algorithms; hypermedia markup languages; information resources; information retrieval; query processing; HTML structures; World Wide Web; class importance values; document collection; genetic algorithm; hyperlinks; retrieval; retrieval effectiveness; retrieving HTML documents; Databases; Electronic switching systems; Frequency; Genetics; HTML; Indexes; Uniform resource locators; Web pages; Web sites; World Wide Web;
fLanguage
English
Publisher
ieee
Conference_Titel
Tools with Artificial Intelligence, 1999. Proceedings. 11th IEEE International Conference on
Conference_Location
Chicago, IL
ISSN
1082-3409
Print_ISBN
0-7695-0456-6
Type
conf
DOI
10.1109/TAI.1999.809831
Filename
809831
Link To Document