• DocumentCode
    2957718
  • Title

    Research on the Construction and Filter Method of Stop-word List in Text Preprocessing

  • Author

    Yao, Zhou ; Ze-wen, Cao

  • Author_Institution
    Sci. & Technol. on Inf. Syst. Eng. Lab., Nat. Univ. of Defense Technol., Changsha, China
  • Volume
    1
  • fYear
    2011
  • fDate
    28-29 March 2011
  • Firstpage
    217
  • Lastpage
    221
  • Abstract
    In the text preprocessing of text mining, a stop-word list is constructed to filter the segment results of the text documents so that the dimensionality of the text feature space can be cut down primarily. This paper summarized the definition, extraction principles and method of stop-word, and constructed a customizing Chinese-English stop-word list with the classical stop-word list based on the difference of text documents´ domain. Three different filter algorithms were designed and implemented in the process of the stop-word filter and their efficiency was compared emphatically. The experiment indicated that the hash-filter method was the fastest.
  • Keywords
    data mining; information filtering; natural languages; text analysis; word processing; Chinese-English stop-word list; extraction principle; filter algorithm; hash filter method; text document; text feature space; text mining; text preprocessing; Algorithm design and analysis; Filtering algorithms; Indexes; Information filters; Switches; Text mining; hash algorithm; stop-word list; stopword filter; text mining; text preprocessing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Computation Technology and Automation (ICICTA), 2011 International Conference on
  • Conference_Location
    Shenzhen, Guangdong
  • Print_ISBN
    978-1-61284-289-9
  • Type

    conf

  • DOI
    10.1109/ICICTA.2011.64
  • Filename
    5750595