DocumentCode :
3288549
Title :
Mixed Graph of Terms: Beyond the Bags of Words Representation of a Text
Author :
De Santo, Massimo ; Napoletano, Paolo ; Pietrosanto, Antonio ; Liguori, Consolatina ; Paciello, Vincenzo ; Polese, Francesco
Author_Institution :
DIEII, Univ. of Salerno, Salerno, Italy
fYear :
2012
fDate :
4-7 Jan. 2012
Firstpage :
1070
Lastpage :
1079
Abstract :
The main purpose of text mining techniques is to identify common patterns through the observation of vectors of features and then to use such patterns to make predictions. Vectors of features are usually made up of weighted words, as well as those used in the text retrieval field, which are obtained thanks to the assumption that considers a document as a "bag of words". However, in this paper we demonstrate that, to obtain more accuracy in the analysis and revelation of common patterns, we could employ (observe) more complex features than simple weighted words. The proposed vector of features considers a hierarchical structure, named a mixed Graph of Terms, composed of a directed and an undirected sub-graph of words, that can be automatically constructed from a small set of documents through the probabilistic Topic Model. The graph has demonstrated its efficiency in a classic "ad-hoc" text retrieval problem. Here we consider expanding the initial query with this new structured vector of features.
Keywords :
data mining; graph theory; pattern classification; probability; query processing; text analysis; ad hoc text retrieval problem; bags of words representation; common pattern analysis; common pattern identification; feature vectors; mixed graph of terms; probabilistic topic model; query processing; text mining; text representation; Data mining; Educational institutions; Feature extraction; Probabilistic logic; Resource management; Semantics; Vectors; probabilistic topic model; query expansion; text mining; text retrieval;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
System Science (HICSS), 2012 45th Hawaii International Conference on
Conference_Location :
Maui, HI
ISSN :
1530-1605
Print_ISBN :
978-1-4577-1925-7
Electronic_ISBN :
1530-1605
Type :
conf
DOI :
10.1109/HICSS.2012.432
Filename :
6149017
Link To Document :
بازگشت