• DocumentCode
    2879734
  • Title

    Improving Thai educational Web page classification using inverse class frequency

  • Author

    Lertnattee, Verayuth ; Theeramunkong, Thanaruk

  • Author_Institution
    Fac. of Pharmacy, Silpakorn Univ., Nakorn Pathom, Thailand
  • Volume
    2
  • fYear
    2005
  • fDate
    12-14 Oct. 2005
  • Firstpage
    817
  • Lastpage
    820
  • Abstract
    Automatic text classification for a Web collection is a challenge task, especially in the case that the language is not English, such as Thai. However, most of Thai educational Web pages usually include English terms due to their technical aspect. Lots of technical terms and typing errors both in Thai and in English are found in Web sites of universities. Most previous works on text categorization applied term frequency and inverse document frequency for representing importance of terms. In this paper, we use inverse class frequency instead of inverse document frequency in centroid-based text categorization because it works well on a collection with a large number of unique terms. The experimental results show that inverse class frequency is useful, especially when it is applied on both prototype and query vectors.
  • Keywords
    Internet; educational computing; text analysis; word processing; Thai educational Web page classification; automatic text classification; centroid-based text categorization; inverse class frequency; Bayesian methods; Electronic mail; Frequency; Natural languages; Prototypes; Statistics; Support vector machine classification; Support vector machines; Text categorization; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communications and Information Technology, 2005. ISCIT 2005. IEEE International Symposium on
  • Print_ISBN
    0-7803-9538-7
  • Type

    conf

  • DOI
    10.1109/ISCIT.2005.1566992
  • Filename
    1566992