Title :
Translation-Inspired OCR
Author :
Genzel, Dmitriy ; Popat, Ashok C. ; Spasojevic, Nemanja ; Jahr, Michael ; Senior, Andrew ; Ie, Eugene ; Tang, Frank Yung-Fong
Author_Institution :
Google, Inc., Mountain View, CA, USA
Abstract :
Optical character recognition is carried out using techniques borrowed from statistical machine translation. In particular, the use of multiple simple feature functions in linear combination, along with minimum-error-rate training, integrated decoding, and N-gram language modeling is found to be remarkably effective, across several scripts and languages. Results are presented using both synthetic and real data in five languages.
Keywords :
computational linguistics; decoding; image coding; language translation; optical character recognition; N-gram language modeling; integrated decoding; minimum-error-rate training; multiple simple feature function; optical character recognition; statistical machine translation; translation-inspired OCR; Text analysis; Optical character recognition; minimum-error-rate training; statistical machine translation;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2011.269