• DocumentCode
    542316
  • Title

    Automatic new word extraction method

  • Author

    Shi, Qin ; Shen, Li Qin ; Chai, Hai Xin

  • Author_Institution
    IBM China Research Laboratory, China
  • Volume
    1
  • fYear
    2002
  • fDate
    13-17 May 2002
  • Abstract
    New words are very difficult to be extracted automatically for those languages where there is no word boundary in written texts, such as Chinese, Japanese etc. In this paper, we present a Statistical method to extract new words from a large amount of corpus with no word boundary. Based on Generalized Suffix Tree (GST) data structure we define NWP (New Word Pattern) and SBP (Segmentation Boundary Pattern) to separate input strings into small pieces, and offer a practical and efficient algorithm to get the proper words from GST.
  • Keywords
    Manuals;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
  • Conference_Location
    Orlando, FL, USA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7402-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.2002.5743876
  • Filename
    5743876