• DocumentCode
    2963464
  • Title

    Stop Word in Readability Assessment of Thai Text

  • Author

    Daowadung, Patcharanut ; Chen, Yaw-Huei

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Chiayi Univ., Chiayi, Taiwan
  • fYear
    2012
  • fDate
    4-6 July 2012
  • Firstpage
    497
  • Lastpage
    499
  • Abstract
    Teachers and parents may use readability to select appropriate learning materials for primary school students. This research constructs Thai stop word list and evaluates the impact of eliminating stop words on readability assessment of Thai text. The corpus contains 1,188 textbook articles used by students from grade 1 to grade 6. Word segmentation, stop word list extraction, and feature selection are the preprocessing tasks performed on the articles in the corpus. Then, term frequency and inverse document frequency (TF-IDF) of the selected terms are used as features for support vector machines (SVMs) to generate classification models. Experimental results show that F-measure can reach 0.87 when identifying Thai articles suitable for middle grades primary school students.
  • Keywords
    feature extraction; learning (artificial intelligence); natural languages; pattern classification; support vector machines; text analysis; word processing; F-measure; SVM; TF-IDF; Thai articles; Thai stop word list; Thai text; classification models generation; feature selection; inverse document frequency; learning materials; middle grades primary school students; readability assessment; stop word list extraction; support vector machines; term frequency; word segmentation; Educational institutions; Mathematical model; Semantics; Support vector machines; Testing; Training; Training data; SVM; TF-IDF; mutual information; readability; stop word list;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Learning Technologies (ICALT), 2012 IEEE 12th International Conference on
  • Conference_Location
    Rome
  • Print_ISBN
    978-1-4673-1642-2
  • Type

    conf

  • DOI
    10.1109/ICALT.2012.9
  • Filename
    6268161