DocumentCode :
3488664
Title :
Unified Performance Evaluation for OCR Zoning: Calculating Page Segmentation´s Score, That Includes Text Zones, Tables and Non-text Objects
Author :
Deryagin, Dmitry
Author_Institution :
Document Anal. Group, ABBYY Production, Moscow, Russia
fYear :
2013
fDate :
25-28 Aug. 2013
Firstpage :
953
Lastpage :
957
Abstract :
The optical character recognition (OCR) systems decompose printed pages into a set of text zones, tables and nontext objects, such as pictures and charts. This part of OCR process is known as the page zoning task. In the paper we present the methodology for assessing the page zoning as a whole task. Many authors evaluate the locations of tables, pictures, and text separately. The key advantage of the proposed system is that it naturally combines the evaluation for text and tables locations, and it is resistant to most segmentation´s ambiguities. We calculate score for texts and tables, basing on ground-truth character locations. The score for non-text objects locations is based on areas matching. These scores are combined to get the final page score.
Keywords :
image segmentation; optical character recognition; text analysis; OCR systems; OCR zoning; area matching; charts; ground-truth character locations; nontext object locations; optical character recognition systems; page segmentation score; page zoning task; picture location evaluation; printed page decomposition; table location evaluation; text location evaluation; text zones; unified performance evaluation; Text analysis; OCR Evaluation; Page Segmentation; Zoning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
ISSN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2013.193
Filename :
6628758
Link To Document :
بازگشت