DocumentCode
2983548
Title
Inductive Model Generation for Text Categorization Using a Bipartite Heterogeneous Network
Author
Rossi, Rafael G. ; de Paulo Faleiros, T. ; De Andrade Lopes, Alneu ; Rezende, Solange O.
Author_Institution
Univ. of Sao Paulo, Sao Carlos, Brazil
fYear
2012
fDate
10-13 Dec. 2012
Firstpage
1086
Lastpage
1091
Abstract
Usually, algorithms for categorization of numeric data have been applied for text categorization after a preprocessing phase which assigns weights for textual terms deemed as attributes. However, due to characteristics of textual data, some algorithms for data categorization are not efficient for text categorization. Characteristics of textual data such as sparsity and high dimensionality sometimes impair the quality of general purpose classifiers. Here, we propose a text classifier based on a bipartite heterogeneous network used to represent textual document collections. Such algorithm induces a classification model assigning weights to objects that represents terms of the textual document collection. The induced weights correspond to the influence of the terms in the classification of documents they appear. The least-mean-square algorithm is used in the inductive process. Empirical evaluation using a large amount of textual document collections shows that the proposed IMBHN algorithm produces significantly better results than the k-NN, C4.5, SVM and Naïve Bayes algorithms.
Keywords
least mean squares methods; network theory (graphs); pattern classification; text analysis; C4.5 algorithm; IMBHN algorithm; Naive Bayes algorithm; SVM algorithm; bipartite heterogeneous network; classification model; dimensionality characteristics; general purpose classifier; inductive model generation; k-NN algorithm; k-nearest neighbor; least mean square algorithm; numeric data categorization algorithm; sparsity characteristics; support vector machines; text categorization; textual data characteristics; textual document collection; textual term; Accuracy; Data models; Equations; Mathematical model; Niobium; Training; Vectors; Heterogeneous Network; Text Categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining (ICDM), 2012 IEEE 12th International Conference on
Conference_Location
Brussels
ISSN
1550-4786
Print_ISBN
978-1-4673-4649-8
Type
conf
DOI
10.1109/ICDM.2012.130
Filename
6413804
Link To Document