• DocumentCode
    2850943
  • Title

    Comparison of Hash Table Verses Lexical Transducer Based Implementations of Urdu Lexicon

  • Author

    Rizvi, S. M Jafar ; Hussain, Mutawarra ; Qaiser, Naeem

  • Author_Institution
    Department of Computer & Information Sciences, Pakistan Institute of Engineering & Applied Sciences (PIEAS), Islamabad, Pakistan. JafarRizvi@Gmail.com
  • fYear
    2004
  • fDate
    30-31 Dec. 2004
  • Firstpage
    29
  • Lastpage
    29
  • Abstract
    Lexicon is the base for many natural language processing applications. This paper describes and compares the approaches for the Urdu lexicon implementation. Raw lexicon as a simple word list is expensive both for search time and space. Using hash table with appropriate hash functions fast searching times, close to perfect hashing are achieved. Hashing results in a simpler acceptable lexicon design on the cost of some extra space. Lexicon storage using trie reduces both search time and size. Further enhancement is achieved by converting trie into directed acyclic word graph. Using which automatic separation of word stems from prefixes and suffixes is performed. Only high frequency prefixes and suffixes having productive morphological information are retained for the final lexical transducer. Comparison reveals that lexical transducer implementation is relatively more complex than hashing, due to morphological analysis requirement, but it is efficient for both search time and storage space requirements.
  • Keywords
    Finite State Automata; Hash Table; Lexical Transducer; Urdu Lexicon; Application software; Automata; Costs; Frequency; Information retrieval; Morphology; Natural language processing; Speech synthesis; Synthesizers; Transducers; Finite State Automata; Hash Table; Lexical Transducer; Urdu Lexicon;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Engineering, Sciences and Technology, Student Conference On
  • Print_ISBN
    0-7803-8871-2
  • Type

    conf

  • DOI
    10.1109/SCONES.2004.1564764
  • Filename
    1564764