• DocumentCode
    2347784
  • Title

    Automatic filtration of multiword units

  • Author

    Liu, Ying ; Tie, Zheng

  • Author_Institution
    Dept. of Chinese Language & Literature, Tsinghua Univ., Beijing, China
  • fYear
    2010
  • fDate
    21-23 Aug. 2010
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    This paper studies how to filtrate multiword units. We use normalized expectation (NE) to extract multiword unit candidates from patent corpus. Then the multiword unit candidates are filtrated using stop words, frequency, first stop words, last stop words, and contextual entropy. The experimental result shows that the precision rate of multiword units is improved by 8.7% after filtration.
  • Keywords
    information filtering; text analysis; automatic filtration; multiword units; normalized expectation; patent corpus; Irrigation; Presses; contextual entropy; extract; filtrate; multiword unit;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-6896-6
  • Type

    conf

  • DOI
    10.1109/NLPKE.2010.5587783
  • Filename
    5587783