DocumentCode :
2963464
Title :
Stop Word in Readability Assessment of Thai Text
Author :
Daowadung, Patcharanut ; Chen, Yaw-Huei
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Chiayi Univ., Chiayi, Taiwan
fYear :
2012
fDate :
4-6 July 2012
Firstpage :
497
Lastpage :
499
Abstract :
Teachers and parents may use readability to select appropriate learning materials for primary school students. This research constructs Thai stop word list and evaluates the impact of eliminating stop words on readability assessment of Thai text. The corpus contains 1,188 textbook articles used by students from grade 1 to grade 6. Word segmentation, stop word list extraction, and feature selection are the preprocessing tasks performed on the articles in the corpus. Then, term frequency and inverse document frequency (TF-IDF) of the selected terms are used as features for support vector machines (SVMs) to generate classification models. Experimental results show that F-measure can reach 0.87 when identifying Thai articles suitable for middle grades primary school students.
Keywords :
feature extraction; learning (artificial intelligence); natural languages; pattern classification; support vector machines; text analysis; word processing; F-measure; SVM; TF-IDF; Thai articles; Thai stop word list; Thai text; classification models generation; feature selection; inverse document frequency; learning materials; middle grades primary school students; readability assessment; stop word list extraction; support vector machines; term frequency; word segmentation; Educational institutions; Mathematical model; Semantics; Support vector machines; Testing; Training; Training data; SVM; TF-IDF; mutual information; readability; stop word list;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Learning Technologies (ICALT), 2012 IEEE 12th International Conference on
Conference_Location :
Rome
Print_ISBN :
978-1-4673-1642-2
Type :
conf
DOI :
10.1109/ICALT.2012.9
Filename :
6268161
Link To Document :
بازگشت