DocumentCode :
3025308
Title :
The Research of Chinese Automatic Word Segmentation In Hierarchical Model Dictionary Binary Tree
Author :
Xiangang, Luo ; Jin, Luo ; Zhong, Xie
Author_Institution :
Fac. of Inf. Eng., China Univ. of Geosci., Wuhan, China
fYear :
2009
fDate :
25-26 April 2009
Firstpage :
321
Lastpage :
324
Abstract :
With the continuous development and growing popularity of the Internet, the amount of information on-line is in the explosive growth. How to find out the information that we need correctly and quickly from the mass data, then put in the front. Under this background, the Internet search engine grows up rapidly. This article describes the search engine on the general principle and common technology, and on this basis, combined with analyzeing the existing technology of Chinese automatic word segmentation, then to achieve a "wide-based segmentation of the largest positive scan," the Chinese word segmentation algorithm. According to the full text index technology, information search engine uses the means both of by word index and words index.Then, through the set of high-speed computing, and retrieve the information of users\´ requriements. Finally, using hierarchical model lexicon binary tree, which reserched and realized the model of information search engine.
Keywords :
indexing; information retrieval; natural language processing; search engines; text analysis; trees (mathematics); Chinese automatic word segmentation; Internet; hierarchical model dictionary binary tree; information search engine; lexicon binary tree; word index; Binary trees; Data engineering; Databases; Dictionaries; Geology; Humans; IP networks; Information retrieval; Search engines; Web and internet services; Chinese Automatic Word Segmentation; hierarchical model; search engine;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database Technology and Applications, 2009 First International Workshop on
Conference_Location :
Wuhan, Hubei
Print_ISBN :
978-0-7695-3604-0
Type :
conf
DOI :
10.1109/DBTA.2009.27
Filename :
5207750
Link To Document :
بازگشت