• DocumentCode
    2452276
  • Title

    A hybrid method to segment words

  • Author

    Dai, Yubiao ; Ren, Xueli

  • Author_Institution
    Dept. of Comput. Sci. & Eng., QuJing Normal Univ., Qujing, China
  • fYear
    2012
  • fDate
    16-18 July 2012
  • Firstpage
    1131
  • Lastpage
    1134
  • Abstract
    Word segmentation is the foundations of machine translation, text classification and information searching. A method is proposed which combines word segmentation based on dictionary with reverse maximum matching and word segmentation based on statistic with suffix array. The input texts are segmented using the reserve maximum matching method based on dictionary, and a two-way suffix arrays are constructed, longest common prefix are computed, candidate words are filtered out by setting the threshold, the candidate words are filtered using mutual information in order to the true words. The texts that are ambiguity are filtered using information entropy. It is showed that the accuracy of word segmentation may achieve above 97% in the experiment.
  • Keywords
    language translation; natural language processing; pattern classification; text analysis; common prefix; hybrid method; information entropy; information searching; input texts; machine translation; suffix array; text classification; word segmentation; Accuracy; Arrays; Dictionaries; Information filters; Matched filters; Sorting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Audio, Language and Image Processing (ICALIP), 2012 International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4673-0173-2
  • Type

    conf

  • DOI
    10.1109/ICALIP.2012.6376786
  • Filename
    6376786