Title :
Combining Multiple Features for Web Data Sources Clustering
Author :
Algergawy, Alsayed ; Saake, Gunter
Author_Institution :
Dept. of Comput. Sci., Otto-von-Guericke Univ., Magdeburg, Germany
Abstract :
The numbers of web data sources grow significantly, and as a sequence, crucial data management issues should be addressed. Clustering is one of the issues that many researchers have focused on. Clustering has been proposed to improve the information availability. To this end, in this paper, we propose a feature-based clustering approach for clustering web data sources without any human intervention and based only on features extracted from the source schemas. In particular, we combine linguistic and structure features of each data source to enhance computation of schema similarity. We experimentally demonstrate the effectiveness of the proposed approach in terms of both the clustering quality and runtime.
Keywords :
Internet; data handling; pattern clustering; Web data sources clustering; crucial data management; feature-based clustering; information availability; linguistic; multiple features; structure features; Clustering algorithms; Feature extraction; Power capacitors; Pragmatics; Semantics; Vectors; XML;
Conference_Titel :
e-Business Engineering (ICEBE), 2013 IEEE 10th International Conference on
Conference_Location :
Coventry
DOI :
10.1109/ICEBE.2013.32