• DocumentCode
    659589
  • Title

    Agglomerative co-clustering for synonymous phrases based on common effects and influences

  • Author

    Kumanami, Koji ; Seki, Katsuyuki ; Uehara, Kazuhiro

  • Author_Institution
    Grad. Sch. of Syst. Inf., Kobe Univ., Kobe, Japan
  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    87
  • Lastpage
    94
  • Abstract
    This paper proposes an approach to clustering synonymous noun phrases focusing on two types of predicate argument relations extracted from potentially big textual data. One is associated with common effects, the other with common influences. Based on the context represented by those relations, a matrix is constructed with rows being noun phrases and columns being a pair of a noun phrase and a verb phrase. Following the distribution hypothesis often adopted in the literature, it is assumed that rows (i.e., noun phrases) with similar distributions share similar meanings. Due to the inherent sparsity of the matrix, however, two strategies are taken to group noun phrases having similar distributions. One strategy is to simply use a large-scale corpus, which however results in an even larger matrix. To handle the large matrix, a parallel distributed programming model, MapReduce, is employed. The other is to adopt hierarchical agglomerative co-clustering and approximates its computation in a way suited to the MapReduce programming model. The proposed approach is evaluated based on a series of experiments in terms of the validity of our underlying assumptions, processing time, quality of the resulting clusters, and effect of parallelization.
  • Keywords
    data handling; distributed programming; natural language processing; pattern clustering; MapReduce; agglomerative coclustering; big textual data; clustering synonymous noun phrases; distribution hypothesis; noun phrases; parallel distributed programming model; synonymous phrases; verb phrase; Approximation methods; Clustering algorithms; Context; Copper; Data mining; Guidelines; Programming; Distributional similarity; Hadoop/MapReduce; Parallel distributed processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691738
  • Filename
    6691738