• DocumentCode
    189205
  • Title

    Probabilistic Clustering and Classification for Textual Data: An Online and Incremental Approach

  • Author

    Fredes Rodrigues, Thiago ; Engel, Paulo Martins

  • Author_Institution
    Inf. Inst., Univ. Fed. do Rio Grande do Sul, Porto Alegre, Brazil
  • fYear
    2014
  • fDate
    18-22 Oct. 2014
  • Firstpage
    288
  • Lastpage
    293
  • Abstract
    Given the amount of information stored in textual data and the fact that it is unstructured, algorithms able to process and transform it to a format useful to solve real world problems are desirable. Tasks like organization and exploration of large document collections can benefit from the design of such methods. This work proposes an incremental, online and probabilistic clustering algorithm for textual data, based on a mixture of Multinomial distributions. The main advantage of the model is that only a single step over the training data is necessary to learn from it. As more texts are processed, the model improves its structure to better represent the data stream.
  • Keywords
    pattern classification; pattern clustering; text analysis; incremental clustering algorithm; multinomial distributions; online clustering algorithm; probabilistic clustering algorithm; textual data classification; Clustering algorithms; Data models; Mathematical model; Probabilistic logic; Training; Vectors; Vocabulary; Document Classification; Document Clustering; Incremental Learning; Online Learning; Topic Modeling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems (BRACIS), 2014 Brazilian Conference on
  • Conference_Location
    Sao Paulo
  • Type

    conf

  • DOI
    10.1109/BRACIS.2014.59
  • Filename
    6984845