Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm

Author

Skabar, Andrew ; Abdalgader, Khaled

Author_Institution

Dept. of Comput. Sci. & Comput. Eng., La Trobe Univ., Melbourne, VIC, Australia

Volume

25

Issue

1

fYear

2013

fDate

Jan. 2013

Firstpage

62

Lastpage

75

Abstract

In comparison with hard clustering methods, in which a pattern belongs to a single cluster, fuzzy clustering algorithms allow patterns to belong to all clusters with differing degrees of membership. This is important in domains such as sentence clustering, since a sentence is likely to be related to more than one theme or topic present within a document or set of documents. However, because most sentence similarity measures do not represent sentences in a common metric space, conventional fuzzy clustering approaches based on prototypes or mixtures of Gaussians are generally not applicable to sentence clustering. This paper presents a novel fuzzy clustering algorithm that operates on relational input data; i.e., data in the form of a square matrix of pairwise similarities between data objects. The algorithm uses a graph representation of the data, and operates in an Expectation-Maximization framework in which the graph centrality of an object in the graph is interpreted as a likelihood. Results of applying the algorithm to sentence clustering tasks demonstrate that the algorithm is capable of identifying overlapping clusters of semantically related sentences, and that it is therefore of potential use in a variety of text mining tasks. We also include results of applying the algorithm to benchmark data sets in several other domains.

Keywords

Gaussian processes; data mining; data structures; expectation-maximisation algorithm; fuzzy set theory; graph theory; matrix algebra; natural language processing; pattern clustering; text analysis; Gaussian mixtures; data representation; expectation-maximization framework; fuzzy relational clustering algorithm; graph centrality; graph representation; natural language processing; overlapping cluster identification; pairwise similarity matrix; pattern clustering; relational input data; semantically related sentences; sentence similarity measures; sentence-level text clustering; text mining; Clustering algorithms; Convergence; Data models; Partitioning algorithms; Prototypes; Fuzzy relational clustering; graph centrality; natural language processing;

fLanguage

English

Journal_Title

Knowledge and Data Engineering, IEEE Transactions on

Publisher

ieee

ISSN

1041-4347

Type

jour

DOI

10.1109/TKDE.2011.205

Filename

6035706