• DocumentCode
    3288549
  • Title

    Mixed Graph of Terms: Beyond the Bags of Words Representation of a Text

  • Author

    De Santo, Massimo ; Napoletano, Paolo ; Pietrosanto, Antonio ; Liguori, Consolatina ; Paciello, Vincenzo ; Polese, Francesco

  • Author_Institution
    DIEII, Univ. of Salerno, Salerno, Italy
  • fYear
    2012
  • fDate
    4-7 Jan. 2012
  • Firstpage
    1070
  • Lastpage
    1079
  • Abstract
    The main purpose of text mining techniques is to identify common patterns through the observation of vectors of features and then to use such patterns to make predictions. Vectors of features are usually made up of weighted words, as well as those used in the text retrieval field, which are obtained thanks to the assumption that considers a document as a "bag of words". However, in this paper we demonstrate that, to obtain more accuracy in the analysis and revelation of common patterns, we could employ (observe) more complex features than simple weighted words. The proposed vector of features considers a hierarchical structure, named a mixed Graph of Terms, composed of a directed and an undirected sub-graph of words, that can be automatically constructed from a small set of documents through the probabilistic Topic Model. The graph has demonstrated its efficiency in a classic "ad-hoc" text retrieval problem. Here we consider expanding the initial query with this new structured vector of features.
  • Keywords
    data mining; graph theory; pattern classification; probability; query processing; text analysis; ad hoc text retrieval problem; bags of words representation; common pattern analysis; common pattern identification; feature vectors; mixed graph of terms; probabilistic topic model; query processing; text mining; text representation; Data mining; Educational institutions; Feature extraction; Probabilistic logic; Resource management; Semantics; Vectors; probabilistic topic model; query expansion; text mining; text retrieval;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    System Science (HICSS), 2012 45th Hawaii International Conference on
  • Conference_Location
    Maui, HI
  • ISSN
    1530-1605
  • Print_ISBN
    978-1-4577-1925-7
  • Electronic_ISBN
    1530-1605
  • Type

    conf

  • DOI
    10.1109/HICSS.2012.432
  • Filename
    6149017