مرکز منطقه ای اطلاع رساني علوم و فناوري - Feature selection for detecting language levels in L2 English Learners

DocumentCode :

1885549

Title :

Feature selection for detecting language levels in L2 English Learners

Author :

Podgornik, Stella

Author_Institution :

Sch. of Comput. Sci., Univ. of Manchester, Manchester, UK

fYear :

2012

fDate :

5-7 Sept. 2012

Firstpage :

Lastpage :

Abstract :

This study analyses different features that would enable classifiers to detect language levels in adult second language (L2) English Learners. 46 different speech samples from users speaking 15 different L1 native languages were selected from the Learning Prosody in a Foreign Language (LeaP) corpus [1]. Using different groupings of features from the spoken L2 secondary language (English), a Support Vector Machine (SVM), was trained and the speakers were classified into three different categories: c1, c2, and s1. These categories used correspond to beginner, intermediate, and advanced levels of the target secondary language, English. The categories in the automatic system correspond to the same category names given by the human annotators of the LeaP corpus. The features are grouped into four different sub-categories: sentence, syllable, duration, and pitch. Count features, such as sentence word count, sentence article count, etc. had the greatest influence on the system, while the sentence features had the second most influence. Surprisingly, most of the pitch features had no effect on the accuracy. A small common word list was also used, that proved to be very helpful. The edit distance measure of the sentences with the common words removed had a positive effect; measurable differences could be found with and without the common words included in the sentences. Due to the small size of the training and testing sets, it was found that the different groupings of the L1 languages of the speakers had a significant effect on the accuracy of the classification predictions. Certain combinations of L1 training and test sets had a higher accuracy rating depending on the L1 languages used in training or test. The classification predictions had a variance as much as 40%.

Keywords :

computer aided instruction; linguistics; natural languages; pattern classification; string matching; support vector machines; L1 native languages; L1 training; L2 English learners; LeaP; SVM; adult second language English learners; classification predictions; count features; duration feature; edit distance measurement; feature analysis; feature selection; foreign language corpus; language level detection; learning prosody; pitch feature; sentence features; speech samples; spoken L2 secondary language; support vector machine; syllable feature; test sets; Accuracy; Manuals; Support vector machines; Testing; Training; Vectors; Vocabulary;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computational Intelligence (UKCI), 2012 12th UK Workshop on

Conference_Location :

Edinburgh

Print_ISBN :

978-1-4673-4391-6

Type :

conf

DOI :

10.1109/UKCI.2012.6335766

Filename :

6335766

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1885549