Title :
A fuzzy representation of HTML documents for information retrieval systems
Author :
Molinari, Andrea ; Pasi, Gabriella
Author_Institution :
Dipartimento di Informatica e Studi Aziendali, Trento Univ., Italy
Abstract :
The diffusion of the World Wide Web (WWW) on Internet, and the consequent increase in production and exchange of textual information demand the development of effective retrieval systems. The typical textual document on the WWW is defined through the HTML (HyperText Marking Language), in which the document is structured in subparts by means of tags. In this paper, an approach to the indexing of HTML documents is proposed, based on the assumption that tags provide the text with different levels of importance with respect to the document content. A significance degree of an index term can then be computed by weighting the term occurrences according to the “importance” associated with the tags in which they appear. In this way, the numeric significance degree of a term takes into account the explicit author´s indications of the different importance of the term in the document
Keywords :
Internet; document handling; fuzzy set theory; hypermedia; indexing; information retrieval systems; page description languages; HTML documents; HyperText Marking Language; HyperText Markup Language; Internet; WWW; fuzzy representation; information retrieval systems; significance degree; subparts; tags; textual information; Content based retrieval; Delay; Frequency; Fuzzy systems; HTML; Indexing; Information retrieval; Internet; Mathematical model; World Wide Web;
Conference_Titel :
Fuzzy Systems, 1996., Proceedings of the Fifth IEEE International Conference on
Conference_Location :
New Orleans, LA
Print_ISBN :
0-7803-3645-3
DOI :
10.1109/FUZZY.1996.551727