• DocumentCode
    2815643
  • Title

    An approach to improving the quality of part-of-speech tagging of Chinese text

  • Author

    Qian, Yi-li ; Zheng, Jia-heng

  • Author_Institution
    Dept. of Comput. Sci., Shanxi Univ., Taiyuan, China
  • Volume
    2
  • fYear
    2004
  • fDate
    5-7 April 2004
  • Firstpage
    183
  • Abstract
    The disambiguation of multicategory words is one of the difficulties in part-of-speech tagging, which greatly affects the processing quality of corpora. Aiming at this question, we describe an approach to correcting the part-of-speech tagging of multicategory words automatically. It acquires correction rules for the part-of-speech tagging of multicategory words from right-tagged corpora based on the theory of rough sets and data mining, and then automatically corrects the corpora´s part-of-speech tagging of multicategory words based on these rules. According to the results of close-test and open-test on the corpus of 500,000 Chinese characters, the accuracy of corpora can be increased by 11.32% and 5.97% respectively.
  • Keywords
    data mining; natural languages; rough set theory; speech synthesis; Chinese text; data mining; multicategory words; part-of-speech tagging; rough set theory; speech quality; Computer errors; Computer science; Data analysis; Data mining; Error correction; Expert systems; Machine learning; Rough sets; Statistics; Tagging;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004. International Conference on
  • Print_ISBN
    0-7695-2108-8
  • Type

    conf

  • DOI
    10.1109/ITCC.2004.1286628
  • Filename
    1286628