• DocumentCode
    442058
  • Title

    Chinese word segmentation based on A-priori and adjacent characters

  • Author

    Wang, Ye ; Huang, Shang-Teng

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., China
  • Volume
    6
  • fYear
    2005
  • fDate
    18-21 Aug. 2005
  • Firstpage
    3808
  • Abstract
    Chinese word segmentation is an important and difficult problem, due to the special written format of Chinese. In this paper, an adjacent characters and A-priori based algorithm is presented for segmentation. In this new method, the information of adjacent characters is utilized to join the n-grams and their adjacent characters. Experimental results show that the performance of the new method is remarkably better than the mutual information based methods when LDC95T13 Chinese collection is tested.
  • Keywords
    natural languages; word processing; A-priori based algorithm; Chinese word segmentation; adjacent characters algorithm; Computer science; Cybernetics; Dictionaries; Gallium nitride; Machine learning; Mutual information; Natural languages; Statistical analysis; Sun; Testing; A-priori; Word segmentation; adjacent characters; n-grams;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
  • Conference_Location
    Guangzhou, China
  • Print_ISBN
    0-7803-9091-1
  • Type

    conf

  • DOI
    10.1109/ICMLC.2005.1527603
  • Filename
    1527603