Title :
An edge-based block segmentation and classification for document analysis with automatic character string extraction
Author :
Park, Chang-Joon ; Jeon, Joon-Hyung ; Koo, Tak-Mo ; Choi, Heung-Moon
Author_Institution :
Sch. of Electron. & Electr. Eng., Kyungpook Nat. Univ., Taegu, South Korea
Abstract :
Presents an edge-based block segmentation and classification with automatic character string extraction for document analysis. By exploiting only four edge features from the gradient and the orientation of the edge pixels, we can make the block segmentations, classifications, and the character string extractions all insensitive to the background noise and the brightness variation of the image. We can efficiently classify a document image into seven categories of small-sized letters, large-sized letters, tables, equations, flow charts, graphs, and photographs, the first five of which are text or character blocks containing characters, and the last two are non-character blocks. We can obtain an efficient block segmentation with reduced memory size by introducing the column and the text line intervals of the document in CRLA (constrained run length algorithm). The simulation results show that an efficient document image segmentation, block classification, and the character string extraction can be done concurrently
Keywords :
document image processing; edge detection; feature extraction; image classification; image segmentation; automatic character string extraction; constrained run length algorithm; document analysis; edge-based block segmentation; edge-based classification; equations; flow charts; graphs; large-sized letters; photographs; small-sized letters; tables; Background noise; Brightness; Computer science; Data mining; Equations; Feature extraction; Flowcharts; Image segmentation; Pixel; Text analysis;
Conference_Titel :
Systems, Man, and Cybernetics, 1996., IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
0-7803-3280-6
DOI :
10.1109/ICSMC.1996.569881