• DocumentCode
    2423648
  • Title

    OCR error detection and correction of an inflectional Indian language script

  • Author

    Chaudhuri, B.B. ; Pal, U.

  • Author_Institution
    Comput. Vision & Pattern Recognition Unit, Indian Stat. Inst., Calcutta, India
  • Volume
    3
  • fYear
    1996
  • fDate
    25-29 Aug 1996
  • Firstpage
    245
  • Abstract
    This paper deals with an OCR error detection and correction technique for a highly inflectional language script like Bangla (a major Indian language). This is the first report of its kind. Using two separate lexicons of root words and suffixes, candidate root-suffix pairs of each input word are detected, their grammatical agreement are tested and the root/suffix part in which the error has occurred is noted. The correction is made on the corresponding error part of the input string by a fast dictionary access technique. To do so some alternative strings are generated for an erroneous word. Among the alternative strings, those satisfying grammatical agreement in root-suffix and also having smallest Levenstein-Damerau distance are finally chosen as the correct ones. The system has an accuracy of 75.61%
  • Keywords
    optical character recognition; table lookup; Bangla; Levenstein-Damerau distance; OCR error correction; OCR error detection; candidate root-suffix pairs; fast dictionary access technique; grammatical agreement; inflectional Indian language script; lexicons; root words; Character recognition; Computer errors; Computer vision; Dictionaries; Error correction; Libraries; Optical character recognition software; Optical design; Pattern recognition; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 1996., Proceedings of the 13th International Conference on
  • Conference_Location
    Vienna
  • ISSN
    1051-4651
  • Print_ISBN
    0-8186-7282-X
  • Type

    conf

  • DOI
    10.1109/ICPR.1996.546947
  • Filename
    546947