DocumentCode :
118539
Title :
Native Language Identification using probabilistic graphical models
Author :
Nicolai, Garrett ; Islam, M.A. ; Greiner, Russell
Author_Institution :
Dept. of Comput. Sci., Univ. of Alberta, Edmonton, AB, Canada
fYear :
2014
fDate :
13-15 Feb. 2014
Firstpage :
1
Lastpage :
6
Abstract :
Native Language Identification (NLI) is the task of identifying the native language of an author of a text written in a second language. Support Vector Machines and Maximum Entropy Learners are the most common methods used to solve this problem, but we consider it from the point-of-view of probabilistic graphical models. We hypothesize that graphical models are well-suited to this task, as they can capture feature inter-dependencies that cannot be exploited by SVMs. Using progressively more connected graphical models, we show that these models out-perform SVMs on reduced feature sets. Furthermore, on full feature sets, even naïve Bayes increases accuracy from 82.06% to 83.41% over SVMs on a 5-language classification task.
Keywords :
Bayes methods; graph theory; learning (artificial intelligence); natural language processing; pattern classification; probability; text analysis; 5-language classification task; NLI; feature inter-dependencies; feature sets; naive Bayes method; native language identification; probabilistic graphical models; second language; text analysis; Communications technology; Bayesian Methods; Machine Learning; NLI; SVM; TAN;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical Information and Communication Technology (EICT), 2013 International Conference on
Conference_Location :
Khulna
Print_ISBN :
978-1-4799-2297-0
Type :
conf
DOI :
10.1109/EICT.2014.6777864
Filename :
6777864
Link To Document :
بازگشت