• DocumentCode
    1826659
  • Title

    Research on Improved Algorithm for Chinese Word Segmentation Based on Markov Chain

  • Author

    Baomao, Pang ; Haoshan, Shi

  • Author_Institution
    Coll. of Electron. Inf., Northwest Polytech. Univ., Xi´´an, China
  • Volume
    1
  • fYear
    2009
  • fDate
    18-20 Aug. 2009
  • Firstpage
    236
  • Lastpage
    238
  • Abstract
    Chinese words segmentation is an important technique for Chinese Web data mining. After the research made on some Chinese word segmentation nowadays, an improved algorithm is proposed in this paper. The algorithm updates dictionary by using two-way Markov chain, and does word segmentation by applying an improved forward maximum matching method based on word frequency statistic. The simulation shows this algorithm can finish word segmentation for a given text quickly and accurately.
  • Keywords
    Internet; Markov processes; data mining; natural language processing; pattern matching; text analysis; Chinese Web data mining; Chinese word segmentation; dictionary; forward maximum matching method; text analysis; two-way Markov chain; word frequency statistic; Algorithm design and analysis; Data mining; Dictionaries; Educational institutions; Frequency; Information security; Natural languages; Space technology; Statistical analysis; Statistics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Assurance and Security, 2009. IAS '09. Fifth International Conference on
  • Conference_Location
    Xian
  • Print_ISBN
    978-0-7695-3744-3
  • Type

    conf

  • DOI
    10.1109/IAS.2009.317
  • Filename
    5284270