Title :
Comparison of Hash Table Verses Lexical Transducer Based Implementations of Urdu Lexicon
Author :
Rizvi, S. M Jafar ; Hussain, Mutawarra ; Qaiser, Naeem
Author_Institution :
Department of Computer & Information Sciences, Pakistan Institute of Engineering & Applied Sciences (PIEAS), Islamabad, Pakistan. JafarRizvi@Gmail.com
Abstract :
Lexicon is the base for many natural language processing applications. This paper describes and compares the approaches for the Urdu lexicon implementation. Raw lexicon as a simple word list is expensive both for search time and space. Using hash table with appropriate hash functions fast searching times, close to perfect hashing are achieved. Hashing results in a simpler acceptable lexicon design on the cost of some extra space. Lexicon storage using trie reduces both search time and size. Further enhancement is achieved by converting trie into directed acyclic word graph. Using which automatic separation of word stems from prefixes and suffixes is performed. Only high frequency prefixes and suffixes having productive morphological information are retained for the final lexical transducer. Comparison reveals that lexical transducer implementation is relatively more complex than hashing, due to morphological analysis requirement, but it is efficient for both search time and storage space requirements.
Keywords :
Finite State Automata; Hash Table; Lexical Transducer; Urdu Lexicon; Application software; Automata; Costs; Frequency; Information retrieval; Morphology; Natural language processing; Speech synthesis; Synthesizers; Transducers; Finite State Automata; Hash Table; Lexical Transducer; Urdu Lexicon;
Conference_Titel :
Engineering, Sciences and Technology, Student Conference On
Print_ISBN :
0-7803-8871-2
DOI :
10.1109/SCONES.2004.1564764