Title :
All-Character Index Dictionary
Author :
Yin, Wensheng ; Guo, Feifei
Author_Institution :
Sch. of Mech. Sci. & Eng., Huazhong Univ. of Sci. & Technol., Wuhan, China
Abstract :
The design of dictionary index structure is the base of Chinese information processing and its properties will influence the effect of Chinese word segmentation greatly. In this paper, a Hash table is firstly established for all commonly-used characters, so that all characters in each word could be found quickly; then for each character, the number of word and their composition relationship in the word consisting of the character are recorded in the word chain to form the all-character index structure; next, the paper discusses the construction and maintenance methods of the dictionary and presents the dictionary constructing, adding and deleting algorithms. Finally a Chinese word segmentation algorithm based on all-character index dictionary is proposed and some comparisons with traditional dictionary in dictionary construction, query speed and function are made.
Keywords :
dictionaries; indexing; word processing; Chinese information processing; Chinese word segmentation; all-character index dictionary; construction methods; dictionary adding algorithms; dictionary constructing algorithms; dictionary deleting algorithms; dictionary index structure; maintenance methods; Communication standards; Design engineering; Dictionaries; Encyclopedias; Explosions; Indexing; Information processing; Mechanical factors; Natural languages; Shape; Chinese word segmentation; all-character index; dictionary; first-character index; index;
Conference_Titel :
Information Engineering and Computer Science, 2009. ICIECS 2009. International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-4994-1
DOI :
10.1109/ICIECS.2009.5367176