• DocumentCode
    507554
  • Title

    Automatic Word Segmentation for Chinese Classics of Tea Based on Tree-Pruning

  • Author

    Fang, Miao ; Jiang, Yi ; Zhao, Qi ; Jiang, Xin

  • Author_Institution
    Northeastern Univ. at Qinhuangdao, Qinhuangdao, China
  • Volume
    1
  • fYear
    2009
  • fDate
    Nov. 30 2009-Dec. 1 2009
  • Firstpage
    438
  • Lastpage
    441
  • Abstract
    Automatic word-segmentation is vital for the reading, comprehension and translation of classics. However, large amount of special terms, allusions and proper names within the classics make it difficult for word segmentation. Taking classics of tea as the subject of research, a method was proposed using likelihood ratio statistics to decide two-character words candidate, three character words candidates and multi-character words candidates, and then segment classics of tea automatically by tree-pruning algorithm. The computation complexity of the tree-pruning algorithm is O (LN), L is number of the Chinese characters of the longest word. Experiments show it has better results in word-segmentation.
  • Keywords
    computational complexity; trees (mathematics); word processing; Chinese classics; automatic word segmentation; computation complexity; likelihood ratio statistics; tree-pruning algorithm; Dictionaries; Frequency; Gaussian distribution; History; Knowledge acquisition; Statistical distributions; Statistics; classics of tea; segmentation; tree-pruning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Knowledge Acquisition and Modeling, 2009. KAM '09. Second International Symposium on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-0-7695-3888-4
  • Type

    conf

  • DOI
    10.1109/KAM.2009.80
  • Filename
    5362115