Title :
A compression algorithm using integrated record information for translation dictionaries
Author :
Fuketa, M. ; Atlam, El-Sayed ; Morita, K. ; Oono, M. ; Aoe, Jun-Ichi
Author_Institution :
Dept. of Inf. Sci. & Intelligent Syst., Tokushima Univ., Japan
Abstract :
A trie structure is a well-known method for retrieving natural language dictionaries. With the development of a variety of natural language processing systems, some types of dictionaries in a computer hard disk have common information. This paper presents a method of integrating these dictionaries into one. Although common information can be packed into one record, each field of the integrated record can be accessed from index tables. There are many long strings in the integrated dictionaries, such as compound words, idioms and frozen phrases which take much space for a huge set of keys when stored in the trie. A compression scheme is proposed by replacing long strings into corresponding leaf node numbers of the trie. The experimental observations show that the new method is more practical and efficient than previous ones.
Keywords :
data compression; dictionaries; language translation; natural languages; tree data structures; tree searching; compression algorithm; experiment; hard disk; index tables; integrated record information; language translation dictionaries; natural language dictionaries; tree data structure; trie structure; Buildings; Compression algorithms; Dictionaries; Dynamic compiler; Hard disks; Information science; Intelligent systems; Natural language processing; Natural languages; Tail;
Conference_Titel :
Systems, Man and Cybernetics, 2002 IEEE International Conference on
Print_ISBN :
0-7803-7437-1
DOI :
10.1109/ICSMC.2002.1173352