Title :
On zonal morphological approach to natural language texts processing
Author :
Shlepakov, D.V. ; Shlepakov, L.N.
Author_Institution :
Inst. of Math., Acad. of Sci., Kiev, Ukraine
Abstract :
Summary form only given. We discuss some actual problems of natural language processing. We consider the flexile language case. We propose a zonal morphologically based model instead of the traditional word based one. If we deal with flexile language there should be an interim layer of language units. We assume that the morphs layer should be considered. It seems to be natural and non-restrictive. By their word order we divide them into four categories or zones: prefixes (P), roots (R), suffices (S), and endings (E). We introduce a new term, namely semantic coverage. Semantic coverage is an analogue of the compact set over all possible words domain. We survey some aspects of the architecture of the morphological processing system. We consider modified Huffman coding that is used in facsimile hardware. We know that a facsimile machine processes only black and white pixel series. Furthermore, they alternate constantly. We may map black and white pixel series to morphological zones. Huffman coding prefixes can be redefined to fit the four zone structure. Another way to fit the facsimile paradigm is the two step appliance of Huffman coding, i.e., we join zones by pairs then apply the coding inside joint pairs, and finally, we use the coding for outside pairs. We note that branching of the multi-level system should be reasonable. All modifications of the basic architecture should influence the root part only. The variable part of the root zone is regulated by a threshold for overflow control. Other parts should be considered as unchangeable because of their constancy as morphological units of the language. The perspectives and problems of flexile language text processing are discussed too
Keywords :
Huffman codes; facsimile; natural languages; text analysis; black and white pixel series; endings; facsimile hardware; modified Huffman coding; natural language texts; overflow control threshold; prefixes; roots; semantic coverage; suffices; text processing; zonal morphological approach; Dictionaries; Facsimile; Hardware; Home appliances; Huffman coding; Mathematics; Natural language processing; Natural languages; Text processing;
Conference_Titel :
Data Compression Conference, 2000. Proceedings. DCC 2000
Conference_Location :
Snowbird, UT
Print_ISBN :
0-7695-0592-9
DOI :
10.1109/DCC.2000.838218