Agglomerative co-clustering for synonymous phrases based on common effects and influences

Author

Kumanami, Koji ; Seki, Katsuyuki ; Uehara, Kazuhiro

Author_Institution

Grad. Sch. of Syst. Inf., Kobe Univ., Kobe, Japan

fYear

2013

fDate

6-9 Oct. 2013

Firstpage

87

Lastpage

94

Abstract

This paper proposes an approach to clustering synonymous noun phrases focusing on two types of predicate argument relations extracted from potentially big textual data. One is associated with common effects, the other with common influences. Based on the context represented by those relations, a matrix is constructed with rows being noun phrases and columns being a pair of a noun phrase and a verb phrase. Following the distribution hypothesis often adopted in the literature, it is assumed that rows (i.e., noun phrases) with similar distributions share similar meanings. Due to the inherent sparsity of the matrix, however, two strategies are taken to group noun phrases having similar distributions. One strategy is to simply use a large-scale corpus, which however results in an even larger matrix. To handle the large matrix, a parallel distributed programming model, MapReduce, is employed. The other is to adopt hierarchical agglomerative co-clustering and approximates its computation in a way suited to the MapReduce programming model. The proposed approach is evaluated based on a series of experiments in terms of the validity of our underlying assumptions, processing time, quality of the resulting clusters, and effect of parallelization.

Keywords

data handling; distributed programming; natural language processing; pattern clustering; MapReduce; agglomerative coclustering; big textual data; clustering synonymous noun phrases; distribution hypothesis; noun phrases; parallel distributed programming model; synonymous phrases; verb phrase; Approximation methods; Clustering algorithms; Context; Copper; Data mining; Guidelines; Programming; Distributional similarity; Hadoop/MapReduce; Parallel distributed processing;

fLanguage

English

Publisher

ieee

Conference_Titel

Big Data, 2013 IEEE International Conference on

Conference_Location

Silicon Valley, CA

Type

conf

DOI

10.1109/BigData.2013.6691738

Filename

6691738