DocumentCode
2850943
Title
Comparison of Hash Table Verses Lexical Transducer Based Implementations of Urdu Lexicon
Author
Rizvi, S. M Jafar ; Hussain, Mutawarra ; Qaiser, Naeem
Author_Institution
Department of Computer & Information Sciences, Pakistan Institute of Engineering & Applied Sciences (PIEAS), Islamabad, Pakistan. JafarRizvi@Gmail.com
fYear
2004
fDate
30-31 Dec. 2004
Firstpage
29
Lastpage
29
Abstract
Lexicon is the base for many natural language processing applications. This paper describes and compares the approaches for the Urdu lexicon implementation. Raw lexicon as a simple word list is expensive both for search time and space. Using hash table with appropriate hash functions fast searching times, close to perfect hashing are achieved. Hashing results in a simpler acceptable lexicon design on the cost of some extra space. Lexicon storage using trie reduces both search time and size. Further enhancement is achieved by converting trie into directed acyclic word graph. Using which automatic separation of word stems from prefixes and suffixes is performed. Only high frequency prefixes and suffixes having productive morphological information are retained for the final lexical transducer. Comparison reveals that lexical transducer implementation is relatively more complex than hashing, due to morphological analysis requirement, but it is efficient for both search time and storage space requirements.
Keywords
Finite State Automata; Hash Table; Lexical Transducer; Urdu Lexicon; Application software; Automata; Costs; Frequency; Information retrieval; Morphology; Natural language processing; Speech synthesis; Synthesizers; Transducers; Finite State Automata; Hash Table; Lexical Transducer; Urdu Lexicon;
fLanguage
English
Publisher
ieee
Conference_Titel
Engineering, Sciences and Technology, Student Conference On
Print_ISBN
0-7803-8871-2
Type
conf
DOI
10.1109/SCONES.2004.1564764
Filename
1564764
Link To Document