DocumentCode
2900529
Title
Suffix Tree Based WEB Information Search System and Optimal Index Algorithms
Author
Wu, Lian-long
Author_Institution
Sch. of Electron. Eng. & Comput. Sci., Peking Univ., Beijing
fYear
2006
fDate
13-16 Aug. 2006
Firstpage
4450
Lastpage
4454
Abstract
Chinese information search engines always encounter a difficulty in segmentation of Chinese words from an article. In this paper, a suffix tree based searching approach is proposed to avoid the difficulty in segmentation of Chinese words. The suffix tree algorithms are studied and a set of optimal algorithms for index build are proposed. Based on the algorithms, a prototype of Chinese information search system is developed and applied to the Chinese Web test collection with 100 GB Web pages (CWT-l00g). The experimental results show that the system is capable of searching Chinese information without segmentation of Chinese words and the speed of index build is reduced to the theoretical limitation
Keywords
Internet; Web sites; natural languages; search engines; tree data structures; Chinese Web test collection; Chinese information search engines system; Web information search system; Web pages; optimal index algorithm; suffix tree based searching approach; Cities and towns; Computer science; Continuous wavelet transforms; Cybernetics; Electronic mail; Information systems; Machine learning; Machine learning algorithms; Prototypes; Search engines; System testing; Tree data structures; Vocabulary; Web pages; Search engine; information system; segmentation of Chinese words; suffix tree;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2006 International Conference on
Conference_Location
Dalian, China
Print_ISBN
1-4244-0061-9
Type
conf
DOI
10.1109/ICMLC.2006.259157
Filename
4028855
Link To Document