Title :
On Using Classical Poetry Structure for Indian Language Post-Processing
Author :
Namboodiri, Anoop M. ; Narayanan, P.J. ; Jawahar, C.V.
Author_Institution :
Int. Inst. of Inf. Technol., Hyderabad
Abstract :
Post-processors are critical to the performance of language recognizers like OCRs, speech recognizers, etc. Dictionary-based post-processing commonly employ either an algorithmic approach or a statistical approach. Other linguistic features are not exploited for this purpose. The language analysis is also largely limited to the prose form. This paper proposes a framework to use the rich metric and formal structure of classical poetic forms in Indian languages for post-processing a recognizer like an OCR engine. We show that the structure present in the form of the vrtta and prasa can be efficiently used to disambiguate some cases that may be difficult for an OCR. The approach is efficient, and complementary to other post-processing approaches and can be used in conjunction with them.
Keywords :
natural language processing; optical character recognition; Indian language postprocessing; classical poetry structure; dictionary-based postprocessing; language recognizer; Dictionaries; Engines; Error correction; Information technology; Natural languages; Optical character recognition software; Robustness; Speech enhancement; Speech recognition; Vocabulary;
Conference_Titel :
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location :
Parana
Print_ISBN :
978-0-7695-2822-9
DOI :
10.1109/ICDAR.2007.4377113