Rough set clustering for Web mining

Author

Lingras, Pawan

Author_Institution

Saint Mary´´s Univ., Halifax, NS, Canada

Volume

2

fYear

2002

fDate

6/24/1905 12:00:00 AM

Firstpage

1039

Lastpage

1044

Abstract

Similar to traditional data mining, three important Web mining operations include clustering, association, and sequential analysis. Typical clustering operations in Web mining involve finding natural groupings of Web resources or Web users. Researchers have pointed out some important differences between clustering in conventional applications and clustering in Web mining. For example, the clusters and associations in Web mining do not necessarily have crisp boundaries. Moreover, due to a variety of reasons inherent in Web browsing and Web logging, the likelihood of bad or incomplete data is higher. As a result, researchers have studied the possibility of using fuzzy sets in Web mining clustering applications. The paper describes how rough set theory can also be used to develop clustering schemes for Web mining. The unsupervised classification described in the paper uses properties of rough sets along with genetic algorithms to represent clusters as interval sets. The paper also describes the design of an experiment including data collection and the clustering process. The experiment is used to create interval set representations of groups of Web visitors

Keywords

data mining; genetic algorithms; information resources; information retrieval; pattern clustering; rough set theory; Web browsing; Web mining; Web resources; Web users; association; clustering; genetic algorithms; interval sets; logging; natural groupings; rough set clustering; sequential analysis; unsupervised classification; Bioinformatics; Data mining; Electronic mail; Fuzzy sets; Genetic algorithms; Genomics; Rough sets; Set theory; Web mining; Web sites;

fLanguage

English

Publisher

ieee

Conference_Titel

Fuzzy Systems, 2002. FUZZ-IEEE'02. Proceedings of the 2002 IEEE International Conference on

Conference_Location

Honolulu, HI

Print_ISBN

0-7803-7280-8

Type

conf

DOI

10.1109/FUZZ.2002.1006647

Filename

1006647