Title :
Stop Word in Readability Assessment of Thai Text
Author :
Daowadung, Patcharanut ; Chen, Yaw-Huei
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Chiayi Univ., Chiayi, Taiwan
Abstract :
Teachers and parents may use readability to select appropriate learning materials for primary school students. This research constructs Thai stop word list and evaluates the impact of eliminating stop words on readability assessment of Thai text. The corpus contains 1,188 textbook articles used by students from grade 1 to grade 6. Word segmentation, stop word list extraction, and feature selection are the preprocessing tasks performed on the articles in the corpus. Then, term frequency and inverse document frequency (TF-IDF) of the selected terms are used as features for support vector machines (SVMs) to generate classification models. Experimental results show that F-measure can reach 0.87 when identifying Thai articles suitable for middle grades primary school students.
Keywords :
feature extraction; learning (artificial intelligence); natural languages; pattern classification; support vector machines; text analysis; word processing; F-measure; SVM; TF-IDF; Thai articles; Thai stop word list; Thai text; classification models generation; feature selection; inverse document frequency; learning materials; middle grades primary school students; readability assessment; stop word list extraction; support vector machines; term frequency; word segmentation; Educational institutions; Mathematical model; Semantics; Support vector machines; Testing; Training; Training data; SVM; TF-IDF; mutual information; readability; stop word list;
Conference_Titel :
Advanced Learning Technologies (ICALT), 2012 IEEE 12th International Conference on
Conference_Location :
Rome
Print_ISBN :
978-1-4673-1642-2
DOI :
10.1109/ICALT.2012.9