Author :
Ward, Charles B. ; Choi, Yejin ; Skiena, Steven ; Xavier, Eduardo C.
Author_Institution :
Comput. Sci. Dept., Stony Brook Univ., Stony Brook, NY, USA
Abstract :
Sentiment analysis is the fundamental component in text-driven monitoring or forecasting systems, where the general sentiment towards real-world entities (e.g., people, products, organizations) are analyzed based on the sentiment signals embedded in a myriad of web text available today. Building such systems involves several practically important problems, from data cleansing (e.g., boilerplate removal, web-spam detection), and sentiment analysis at individual mention-level (e.g., phrase, sentence-, document-level) to the aggregation of sentiment for each entity-level (e.g., person, company) analysis. Most previous research in sentiment analysis however, has focused only on individual mention-level analysis, and there has been relatively less work that copes with other practically important problems for enabling a large-scale sentiment monitoring system. In this paper, we propose Empath, a new framework for evaluating entity-level sentiment analysis. Empath leverages objective measurements of entities in various domains such as people, companies, countries, movies, and sports, to facilitate entity-level sentiment analysis and tracking. We demonstrate the utility of Empath for the evaluation of a large-scale sentiment system by applying it to various lexicons using Lydia, our own large scale text-analytics tool, over a corpus consisting of more than a terabyte of newspaper data. We expect that Empath will encourage research that encompasses end-to-end pipelines to enable a large-scale text-driven monitoring and forecasting systems.
Keywords :
Internet; forecasting theory; monitoring; text analysis; Empath; Web text; data cleansing; entity-level sentiment analysis; forecasting systems; mention-level analysis; sentiment signals; text-driven monitoring; Companies; Dictionaries; Forecasting; Humans; Joining processes; Motion pictures; Standards;