• DocumentCode
    2259228
  • Title

    Generating and mixing feature sets from language models for sentiment classification

  • Author

    Jeong, Yoonjae ; Kim, Youngho ; Kim, Seongchan ; Myaeng, Sung-Hyon ; Oh, Hyo-Jung

  • Author_Institution
    Korea Adv. Inst. of Sci. & Technol. (KAIST) Daejeon, Daejeon, South Korea
  • fYear
    2009
  • fDate
    24-27 Sept. 2009
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    This paper presents methods for mixing feature sets in sentence-level sentiment analysis where a sentence is classified into one of three classes: positive, negative, and neutral. Motivated by the need to classify sentences in Korean whose sentiment-revealing expressions tend to have different effects according to their syntactic categories, we employed a language modeling (LM) approach with 162 different LMs based on syntactic categories that are effectively combined with a logistic regression classifier. The experimental results show that this approach significantly outperforms clue-based SVM classifiers. The enumeration of feature types arising from the LMs for the logistic regression classifier allowed us to show that domain specific models can be smoothed with a general model and that attaching a syntactic category to a feature helps improving effectiveness. The classification results are further improved by applying a clue-based classifier. The rationale behind this two-step process is to classify sentences with a relatively conservative classifier in picking positive and negative sentences and to apply a high-precision classifier to the sentences in the neutral class.
  • Keywords
    classification; natural language processing; regression analysis; text analysis; clue-based classifier; language modeling; logistic regression classifier; sentence-level sentiment analysis; sentiment classification; syntactic category; Joining processes; Labeling; Logistics; Machine learning; Motion pictures; Natural languages; Support vector machine classification; Support vector machines; Text categorization; Training data; Text Categorization; polarity classification; sentiment analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
  • Conference_Location
    Dalian
  • Print_ISBN
    978-1-4244-4538-7
  • Electronic_ISBN
    978-1-4244-4540-0
  • Type

    conf

  • DOI
    10.1109/NLPKE.2009.5313746
  • Filename
    5313746