Title :
Probabilistic Clustering and Classification for Textual Data: An Online and Incremental Approach
Author :
Fredes Rodrigues, Thiago ; Engel, Paulo Martins
Author_Institution :
Inf. Inst., Univ. Fed. do Rio Grande do Sul, Porto Alegre, Brazil
Abstract :
Given the amount of information stored in textual data and the fact that it is unstructured, algorithms able to process and transform it to a format useful to solve real world problems are desirable. Tasks like organization and exploration of large document collections can benefit from the design of such methods. This work proposes an incremental, online and probabilistic clustering algorithm for textual data, based on a mixture of Multinomial distributions. The main advantage of the model is that only a single step over the training data is necessary to learn from it. As more texts are processed, the model improves its structure to better represent the data stream.
Keywords :
pattern classification; pattern clustering; text analysis; incremental clustering algorithm; multinomial distributions; online clustering algorithm; probabilistic clustering algorithm; textual data classification; Clustering algorithms; Data models; Mathematical model; Probabilistic logic; Training; Vectors; Vocabulary; Document Classification; Document Clustering; Incremental Learning; Online Learning; Topic Modeling;
Conference_Titel :
Intelligent Systems (BRACIS), 2014 Brazilian Conference on
Conference_Location :
Sao Paulo
DOI :
10.1109/BRACIS.2014.59