• DocumentCode
    2900529
  • Title

    Suffix Tree Based WEB Information Search System and Optimal Index Algorithms

  • Author

    Wu, Lian-long

  • Author_Institution
    Sch. of Electron. Eng. & Comput. Sci., Peking Univ., Beijing
  • fYear
    2006
  • fDate
    13-16 Aug. 2006
  • Firstpage
    4450
  • Lastpage
    4454
  • Abstract
    Chinese information search engines always encounter a difficulty in segmentation of Chinese words from an article. In this paper, a suffix tree based searching approach is proposed to avoid the difficulty in segmentation of Chinese words. The suffix tree algorithms are studied and a set of optimal algorithms for index build are proposed. Based on the algorithms, a prototype of Chinese information search system is developed and applied to the Chinese Web test collection with 100 GB Web pages (CWT-l00g). The experimental results show that the system is capable of searching Chinese information without segmentation of Chinese words and the speed of index build is reduced to the theoretical limitation
  • Keywords
    Internet; Web sites; natural languages; search engines; tree data structures; Chinese Web test collection; Chinese information search engines system; Web information search system; Web pages; optimal index algorithm; suffix tree based searching approach; Cities and towns; Computer science; Continuous wavelet transforms; Cybernetics; Electronic mail; Information systems; Machine learning; Machine learning algorithms; Prototypes; Search engines; System testing; Tree data structures; Vocabulary; Web pages; Search engine; information system; segmentation of Chinese words; suffix tree;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2006 International Conference on
  • Conference_Location
    Dalian, China
  • Print_ISBN
    1-4244-0061-9
  • Type

    conf

  • DOI
    10.1109/ICMLC.2006.259157
  • Filename
    4028855