Title :
Suffix Tree Based WEB Information Search System and Optimal Index Algorithms
Author_Institution :
Sch. of Electron. Eng. & Comput. Sci., Peking Univ., Beijing
Abstract :
Chinese information search engines always encounter a difficulty in segmentation of Chinese words from an article. In this paper, a suffix tree based searching approach is proposed to avoid the difficulty in segmentation of Chinese words. The suffix tree algorithms are studied and a set of optimal algorithms for index build are proposed. Based on the algorithms, a prototype of Chinese information search system is developed and applied to the Chinese Web test collection with 100 GB Web pages (CWT-l00g). The experimental results show that the system is capable of searching Chinese information without segmentation of Chinese words and the speed of index build is reduced to the theoretical limitation
Keywords :
Internet; Web sites; natural languages; search engines; tree data structures; Chinese Web test collection; Chinese information search engines system; Web information search system; Web pages; optimal index algorithm; suffix tree based searching approach; Cities and towns; Computer science; Continuous wavelet transforms; Cybernetics; Electronic mail; Information systems; Machine learning; Machine learning algorithms; Prototypes; Search engines; System testing; Tree data structures; Vocabulary; Web pages; Search engine; information system; segmentation of Chinese words; suffix tree;
Conference_Titel :
Machine Learning and Cybernetics, 2006 International Conference on
Conference_Location :
Dalian, China
Print_ISBN :
1-4244-0061-9
DOI :
10.1109/ICMLC.2006.259157