DocumentCode
659589
Title
Agglomerative co-clustering for synonymous phrases based on common effects and influences
Author
Kumanami, Koji ; Seki, Katsuyuki ; Uehara, Kazuhiro
Author_Institution
Grad. Sch. of Syst. Inf., Kobe Univ., Kobe, Japan
fYear
2013
fDate
6-9 Oct. 2013
Firstpage
87
Lastpage
94
Abstract
This paper proposes an approach to clustering synonymous noun phrases focusing on two types of predicate argument relations extracted from potentially big textual data. One is associated with common effects, the other with common influences. Based on the context represented by those relations, a matrix is constructed with rows being noun phrases and columns being a pair of a noun phrase and a verb phrase. Following the distribution hypothesis often adopted in the literature, it is assumed that rows (i.e., noun phrases) with similar distributions share similar meanings. Due to the inherent sparsity of the matrix, however, two strategies are taken to group noun phrases having similar distributions. One strategy is to simply use a large-scale corpus, which however results in an even larger matrix. To handle the large matrix, a parallel distributed programming model, MapReduce, is employed. The other is to adopt hierarchical agglomerative co-clustering and approximates its computation in a way suited to the MapReduce programming model. The proposed approach is evaluated based on a series of experiments in terms of the validity of our underlying assumptions, processing time, quality of the resulting clusters, and effect of parallelization.
Keywords
data handling; distributed programming; natural language processing; pattern clustering; MapReduce; agglomerative coclustering; big textual data; clustering synonymous noun phrases; distribution hypothesis; noun phrases; parallel distributed programming model; synonymous phrases; verb phrase; Approximation methods; Clustering algorithms; Context; Copper; Data mining; Guidelines; Programming; Distributional similarity; Hadoop/MapReduce; Parallel distributed processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data, 2013 IEEE International Conference on
Conference_Location
Silicon Valley, CA
Type
conf
DOI
10.1109/BigData.2013.6691738
Filename
6691738
Link To Document