DocumentCode :
2900529
Title :
Suffix Tree Based WEB Information Search System and Optimal Index Algorithms
Author :
Wu, Lian-long
Author_Institution :
Sch. of Electron. Eng. & Comput. Sci., Peking Univ., Beijing
fYear :
2006
fDate :
13-16 Aug. 2006
Firstpage :
4450
Lastpage :
4454
Abstract :
Chinese information search engines always encounter a difficulty in segmentation of Chinese words from an article. In this paper, a suffix tree based searching approach is proposed to avoid the difficulty in segmentation of Chinese words. The suffix tree algorithms are studied and a set of optimal algorithms for index build are proposed. Based on the algorithms, a prototype of Chinese information search system is developed and applied to the Chinese Web test collection with 100 GB Web pages (CWT-l00g). The experimental results show that the system is capable of searching Chinese information without segmentation of Chinese words and the speed of index build is reduced to the theoretical limitation
Keywords :
Internet; Web sites; natural languages; search engines; tree data structures; Chinese Web test collection; Chinese information search engines system; Web information search system; Web pages; optimal index algorithm; suffix tree based searching approach; Cities and towns; Computer science; Continuous wavelet transforms; Cybernetics; Electronic mail; Information systems; Machine learning; Machine learning algorithms; Prototypes; Search engines; System testing; Tree data structures; Vocabulary; Web pages; Search engine; information system; segmentation of Chinese words; suffix tree;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2006 International Conference on
Conference_Location :
Dalian, China
Print_ISBN :
1-4244-0061-9
Type :
conf
DOI :
10.1109/ICMLC.2006.259157
Filename :
4028855
Link To Document :
بازگشت