DocumentCode :
1908815
Title :
Data Quality Controlling for Cross-Lingual Sentiment Classification
Author :
Shoushan Li ; Yunxia Xue ; Zhongqing Wang ; Lee, Sophia Yat Mei ; Chu-Ren Huang
Author_Institution :
Natural Language Process. Lab., Soochow Univ., Suzhou, China
fYear :
2013
fDate :
17-19 Aug. 2013
Firstpage :
125
Lastpage :
128
Abstract :
Cross-lingual sentiment classification aims to perform sentiment classification in a language (named as the target language) with the help of the resources from another language (named as the source language). Previous studies are prone to using all available data in the source language while using all data is observed to perform no better or even worse than using a partion of good data. In this paper, we propose a novel task called data quality controlling in the source language to select high quality samples from the source language. To tackle this task, we propose two kinds of data quality measurements: intra- and extra-quality measurements which are implemented with the certainty and similarity measurements respectively. The empirical studies demonstrate the effectiveness of the proposed approach to data quality controlling in the source language.
Keywords :
natural language processing; pattern classification; certainty measurements; cross-lingual sentiment classification; data quality control; data quality measurements; similarity measurements; source language; target language; Accuracy; DVD; Natural language processing; Noise measurement; Silicon; Testing; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2013 International Conference on
Conference_Location :
Urumqi
Type :
conf
DOI :
10.1109/IALP.2013.43
Filename :
6646019
Link To Document :
بازگشت