Title :
Design and implementation of text filtering with no semantic accidental injury
Author :
Yan, Danfeng ; Liu, Jia ; Yang, Fangchun
Author_Institution :
State Key Lab. of Networking & Switching Technol., Beijing Univ. of Posts & Telecommun., Beijing, China
Abstract :
Information filtering in Internet refers to finding and filtering the bad words in large-scale web text. The accuracy and efficiency are the main problems of concern. The mixture of Chinese and English text filtering is the research emphasis in this paper. The paper proposes a Chinese and English text filtering algorithm-No Semantic Accidental Injury Filter(NSAIF) algorithm to avoid semantic injury. It´s based on Aho-2Corasick (AC) algorithm, but avoids space expansion with dynamic memory allocation. It´s applicative for Chinese and English text using one-byte storage. It uses the longest match principle to find the words should be filtered in the trie augmented with failure pointers. It has the good time and space performance in different size of test data sets and has the high theoretical and practical values.
Keywords :
Internet; information filtering; text analysis; Aho-2Corasick algorithm; Chinese text filtering; English text filtering; Internet; dynamic memory allocation; information filtering; large-scale Web text; no semantic accidental injury filter; Algorithm design and analysis; Encoding; Filtering algorithms; Injuries; Matched filters; Semantics; AC; Chinese and English; Semantic accidental injury; longest match principle; text filtering;
Conference_Titel :
Broadband Network and Multimedia Technology (IC-BNMT), 2011 4th IEEE International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-61284-158-8
DOI :
10.1109/ICBNMT.2011.6155896