• DocumentCode
    39764
  • Title

    Identifying Features in Opinion Mining via Intrinsic and Extrinsic Domain Relevance

  • Author

    Zhen Hai ; Kuiyu Chang ; Jung-Jae Kim ; Yang, C.C.

  • Author_Institution
    N4-B3C-14 DISCO Lab., Nanyang Technol. Univ., Singapore, Singapore
  • Volume
    26
  • Issue
    3
  • fYear
    2014
  • fDate
    Mar-14
  • Firstpage
    623
  • Lastpage
    634
  • Abstract
    The vast majority of existing approaches to opinion feature extraction rely on mining patterns only from a single review corpus, ignoring the nontrivial disparities in word distributional characteristics of opinion features across different corpora. In this paper, we propose a novel method to identify opinion features from online reviews by exploiting the difference in opinion feature statistics across two corpora, one domain-specific corpus (i.e., the given review corpus) and one domain-independent corpus (i.e., the contrasting corpus). We capture this disparity via a measure called domain relevance (DR), which characterizes the relevance of a term to a text collection. We first extract a list of candidate opinion features from the domain review corpus by defining a set of syntactic dependence rules. For each extracted candidate feature, we then estimate its intrinsic-domain relevance (IDR) and extrinsic-domain relevance (EDR) scores on the domain-dependent and domain-independent corpora, respectively. Candidate features that are less generic (EDR score less than a threshold) and more domain-specific (IDR score greater than another threshold) are then confirmed as opinion features. We call this interval thresholding approach the intrinsic and extrinsic domain relevance (IEDR) criterion. Experimental results on two real-world review domains show the proposed IEDR approach to outperform several other well-established methods in identifying opinion features.
  • Keywords
    data mining; feature extraction; statistical analysis; text analysis; EDR score estimation; IDR; IEDR criterion; contrasting corpus; domain review corpus; domain-dependent corpora; domain-independent corpora; extrinsic-domain relevance score estimation; feature identification; interval thresholding approach; intrinsic and extrinsic domain relevance criterion; intrinsic-domain relevance score estimation; one domain-independent corpus; one domain-specific corpus; opinion feature extraction; opinion feature statistics; opinion mining; pattern mining; syntactic dependence rules; text collection; word distributional characteristics; Batteries; Data mining; Dispersion; Educational institutions; Feature extraction; Hidden Markov models; Syntactics; Chinese; Information search and retrieval; natural language processing; opinion feature; opinion mining;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2013.26
  • Filename
    6427744