• DocumentCode
    655276
  • Title

    Combining Multiple Features for Web Data Sources Clustering

  • Author

    Algergawy, Alsayed ; Saake, Gunter

  • Author_Institution
    Dept. of Comput. Sci., Otto-von-Guericke Univ., Magdeburg, Germany
  • fYear
    2013
  • fDate
    11-13 Sept. 2013
  • Firstpage
    213
  • Lastpage
    218
  • Abstract
    The numbers of web data sources grow significantly, and as a sequence, crucial data management issues should be addressed. Clustering is one of the issues that many researchers have focused on. Clustering has been proposed to improve the information availability. To this end, in this paper, we propose a feature-based clustering approach for clustering web data sources without any human intervention and based only on features extracted from the source schemas. In particular, we combine linguistic and structure features of each data source to enhance computation of schema similarity. We experimentally demonstrate the effectiveness of the proposed approach in terms of both the clustering quality and runtime.
  • Keywords
    Internet; data handling; pattern clustering; Web data sources clustering; crucial data management; feature-based clustering; information availability; linguistic; multiple features; structure features; Clustering algorithms; Feature extraction; Power capacitors; Pragmatics; Semantics; Vectors; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    e-Business Engineering (ICEBE), 2013 IEEE 10th International Conference on
  • Conference_Location
    Coventry
  • Type

    conf

  • DOI
    10.1109/ICEBE.2013.32
  • Filename
    6686265