DocumentCode :
260680
Title :
Document grouping with concept based discriminative analysis and feature partition
Author :
Kajapriya, S. ; Vimal Shankar, K.N.
Author_Institution :
Dept. of Comput. Sci. & Eng., V.S.B. Eng. Coll., Karur, India
fYear :
2014
fDate :
27-28 Feb. 2014
Firstpage :
1
Lastpage :
4
Abstract :
Clustering is one of the most important techniques in machine learning and data mining responsibilities. Similar documents are grouped by performing clustering techniques. Similarity measure is used to determine transaction associations. Hierarchical clustering method produces tree structured results. Partition based clustering model produces the results in grid format. Text documents are formless data values with high dimensional attributes. Document clustering group the unlabeled text documents into meaningful clusters. Traditionally clustering methods need cluster count (K) before the document grouping process. Clustering accuracy decreases drastically with reference to the unsuitable cluster count. Document word features are automatically partitioned into two groups discriminative words and non-discriminative words. But only discriminative words are useful for grouping documents. The contribution of nondiscriminative words confuses the clustering process and leads to poor cluster solutions. The variational inference algorithm is used to infer the document collection structure and partition of document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition documents. DPM clustering model utilizes both the data likelihood and the clustering property of the Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to discover the latent cluster structure based on the DPM model. DPMFP clustering model is performed without requiring the no. of clusters as input. The Discriminative word identification process is enhanced with the labeled document analysis mechanism. The concept relationships are analyzed with Ontology support. Semantic weight analysis is used for the document similarity measure. This method increases the scalability with the support of labels and concept relations for dimensionality cutback process.
Keywords :
data mining; inference mechanisms; learning (artificial intelligence); mixture models; ontologies (artificial intelligence); pattern clustering; text analysis; variational techniques; DPM clustering model; DPMFP clustering model; Dirichlet process mixture model for feature partition; clustering property; concept based discriminative analysis; data likelihood; data mining; dimensionality cutback process; discriminative word identification process; discriminative words; document analysis mechanism; document clustering accuracy; document collection structure; document word features; document word partition; hierarchical clustering method; latent cluster structure; machine learning; nondiscriminative words; partition based clustering model; similarity measure; transaction associations; unlabeled text documents; Clustering methods; Educational institutions; Feature extraction; Inference algorithms; Partitioning algorithms; Semantics; Text mining; Database management; Dirichlet Process Mixture Model; Document Clustering; Feature Partition; Semi-Supervised; Text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Communication and Embedded Systems (ICICES), 2014 International Conference on
Conference_Location :
Chennai
Print_ISBN :
978-1-4799-3835-3
Type :
conf
DOI :
10.1109/ICICES.2014.7033763
Filename :
7033763
Link To Document :
بازگشت