DocumentCode
1995444
Title
MIKA: A tagged corpus for modern standard Arabic and colloquial sentiment analysis
Author
Ibrahim, Hossam S. ; Abdou, Sherif M. ; Gheith, Mervat
Author_Institution
Comput. Sci. Dept., Inst. of Stat. studies & Res. (ISSR) Cairo Univ., Cairo, Egypt
fYear
2015
fDate
9-11 July 2015
Firstpage
353
Lastpage
358
Abstract
Sentiment analysis (SA) and opinion mining (OM) becomes a field of interest that fueled the attention of research during the last decade, due to the rise of the amount of internet documents (especially online reviews and comments) on the social media such as blogs and social networks. Many attempts have been conducted to build a corpus for SA, due to the consideration of importance of building such resource as a key factor in SA and OM systems. But the need of building these resources is still ongoing, especially for morphologically-Rich language (MRL) such as Arabic. In this paper, we present MIKA a multi-genre tagged corpus of modern standard Arabic (MSA) and colloquial. MIKA is manually collected and annotated at sentence level with semantic orientation (positive or negative or neutral). A number of rich set of linguistically motivated features (contextual Intensifiers, contextual Shifter and negation handling), syntactic features for conflicting phrases and others are used for the annotation process. Our data focus on MSA and Egyptian dialectal Arabic. We report the efforts of manually building and annotating our sentiment corpus using different types of data, such as tweets and Arabic microblogs (hotel reservation, product reviews, and TV program comments).
Keywords
Internet; data mining; natural language processing; social networking (online); text analysis; Arabic microblogs; Egyptian dialectal Arabic; Internet documents; MIKA; colloquial sentiment analysis; contextual intensifiers; contextual shifter; linguistically motivated features; modern standard Arabic sentiment analysis; morphologically-rich language; multigenre tagged corpus; negation handling; opinion mining; social media; syntactic features; Blogs; Data mining; Internet; Pragmatics; Sentiment analysis; Standards; Syntactics; Arabic corpuse; opinion mining; polarity strength; sentiment analysis; sentiment polarity;
fLanguage
English
Publisher
ieee
Conference_Titel
Recent Trends in Information Systems (ReTIS), 2015 IEEE 2nd International Conference on
Conference_Location
Kolkata
Type
conf
DOI
10.1109/ReTIS.2015.7232904
Filename
7232904
Link To Document