• DocumentCode
    2732262
  • Title

    Distributional Similarity Model for Multi-modality Clustering in Social Media

  • Author

    Sze, Donahue C M ; Fu, Tak-chung ; Chung, Fu-lai ; Luk, Robert W P

  • Author_Institution
    Hong Kong Polytech. Univ., Hong Kong
  • fYear
    2007
  • fDate
    5-12 Nov. 2007
  • Firstpage
    268
  • Lastpage
    271
  • Abstract
    User generated content (UGC) has become the fastest growing sector of the WWW. Data mining from UGC presents challenges not typically found in text mining from documents. UGC can be semi-structured and its content can be very short and informal, containing relatively little content similar to a chat or an email conversation. In addition UGC can be viewed as a multi-modality data. These characteristics pose big challenges and research questions for scholars to cope with. To cluster UGC data, we can construct multiple contingency tables of modalities and employ the multi-way distributional clustering (MDC) algorithm. However, by considering a contingency table which summarizes the co-occurrence statistics of two modalities, it is not robust to represent the information entropy between two modalities in UGC data. In this paper, we propose a novel similarity measurement, called distributional similarity model (DSM), to solidify the graph model in the MDC algorithm to deal with the unique characteristics of the UGC data.
  • Keywords
    Internet; data mining; user interfaces; data mining; distributional similarity model; email conversation; multi-modality clustering; multi-way distributional clustering; social media; text mining; user generated content; Clustering algorithms; Data mining; Information entropy; Intelligent agent; Machine learning algorithms; Robustness; Solid modeling; Text mining; User-generated content; World Wide Web; Social Media AnalysisMulti-Modality ClusteringDistributional Features;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence and Intelligent Agent Technology Workshops, 2007 IEEE/WIC/ACM International Conferences on
  • Conference_Location
    Silicon Valley, CA
  • Print_ISBN
    0-7695-3028-1
  • Type

    conf

  • DOI
    10.1109/WI-IATW.2007.105
  • Filename
    4427586