DocumentCode
2613897
Title
Web documents categorization using fuzzy representation and HAC
Author
Deng, Jiawei ; Chen, Lihui
Author_Institution
Sch. of Electr. & Electron. Eng., Nanyang Tech. Univ., Singapore
Volume
2
fYear
2000
fDate
2000
Firstpage
24
Abstract
Most of the existing techniques for the characterization of Web documents are based on term-frequency analysis. In such models, given a set of documents, the characterization of each document is represented by a feature vector in a vector space. However, as Web documents written in HTML are semi-structured by means of tags, the traditional techniques that assign term weights only by the frequency of occurrence may not be able to provide satisfactory results in representing the content of such documents. Some recent studies have shown that the fuzzy representation (FR) of WWW information based on the significance of HTML tags is an effective alternative for characterizing Web documents. In this paper, the FR is used to generate the feature vector for each Web document and the hierarchical agglomerative clustering (HAC) algorithm is applied to investigate its efficiency and effectiveness for the automatic categorization of Web documents with similar contents. Experiments that have been conducted suggest several benefits of using such an approach
Keywords
classification; fuzzy set theory; hypermedia markup languages; information resources; pattern clustering; vectors; HAC algorithm; HTML tags; World Wide Web document categorization; document characterization; document content representation; feature vector; fuzzy representation; hierarchical agglomerative clustering; occurrence frequency; semi-structured documents; term weights; term-frequency analysis; vector space; Clustering algorithms; Frequency; HTML; Information retrieval; Internet; Natural languages; Navigation; Probes; Web pages; World Wide Web;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Information Systems Engineering, 2000. Proceedings of the First International Conference on
Conference_Location
Hong Kong
Print_ISBN
0-7695-0577-5
Type
conf
DOI
10.1109/WISE.2000.882848
Filename
882848
Link To Document