Title :
Using word segmentation and SVM to assess readability of Thai text for primary school students
Author :
Daowadung, Patcharanut ; Chen, Yaw-Huei
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Chiayi Univ., Chiayi, Taiwan
Abstract :
This research aims to develop a readability assessment technique to find appropriate Thai language reading materials for primary school students. The corpus contains 1050 articles from textbooks used by students from grade 1 to grade 6. We preprocess the articles by Ling CD program for Thai word segmentation and use mutual information (MI) to select the most important terms in the corpus. Term frequency and inverse document frequency (TF-IDF) are used as features for support vector machines (SVMs) to generate classification models. Experimental results show that the proposed method can reach 0.83 F-measure for identifying articles suitable for middle grades primary school students.
Keywords :
document handling; natural language processing; support vector machines; text analysis; word processing; Thai language reading materials; Thai text; Thai word segmentation; classification models; inverse document frequency; mutual information; primary school students; readability assessment technique; support vector machines; term frequency; SVM; TF-IDF; mutual information; readability;
Conference_Titel :
Computer Science and Software Engineering (JCSSE), 2011 Eighth International Joint Conference on
Conference_Location :
Nakhon Pathom
Print_ISBN :
978-1-4577-0686-8
DOI :
10.1109/JCSSE.2011.5930115