• DocumentCode
    3578794
  • Title

    Improving classification performance by extending documents terms

  • Author

    Widodo ; Wibowo, Wahyu Catur

  • Author_Institution
    Fac. of Comput. Sci., Univ. of Indonesia, Jakarta, Indonesia
  • fYear
    2014
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Classification is a technique in data mining for categorizing objects. Text Classification is re-challenged for classifying very short documents or text as shown in social media collection. This paper proposes a method to improve the performance of classification on short documents. In this work, we expand words in every document before the documents are classified We use TFIDF model, Hidden Markov Model k-means clustering, and Latent Semantic Indexing (LSI) for expanding documents. The results show that extending document term by just 1 word will increase its accuracy, while extending by 2,4, and 8 words tend to give stable results.
  • Keywords
    category theory; classification; data mining; hidden Markov models; indexing; pattern clustering; text analysis; LSI; TFIDF model; data mining; documents terms; hidden Markov model; k-means clustering; latent semantic indexing; object categorization; text classification; Accuracy; Bagging; Bayes methods; Hidden Markov models; Semantics; Text categorization; Hidden Markov Model k-means; Latent Semantic Indexing; TFIDF model; extend words; text classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data and Software Engineering (ICODSE), 2014 International Conference on
  • Print_ISBN
    978-1-4799-8175-5
  • Type

    conf

  • DOI
    10.1109/ICODSE.2014.7062657
  • Filename
    7062657