مرکز منطقه ای اطلاع رساني علوم و فناوري - Statistical-based approach to word segmentation

DocumentCode :

2871045

Title :

Statistical-based approach to word segmentation

Author :

Wang, Yalin ; Phillips, Ihsin T. ; Haralick, Robert

Author_Institution :

Dept. of Electr. Eng., Washington Univ., Seattle, WA, USA

Volume :

fYear :

2000

fDate :

2000

Firstpage :

555

Abstract :

This paper presents a text word extraction algorithm that takes a set of bounding boxes of glyphs and their associated text lines of a given document and partitions the glyphs into a set of text words, using only the geometric information of the input glyphs. The algorithm is probability based. An iterative, relaxation-like method is used to find the partitioning solution that maximizes the joint probability. To evaluate the performance of our test word extraction algorithm, we used a 3-fold validation method and developed a quantitative performance measure. The algorithm was evaluated on the UW-III database of some 1600 scanned document image pages. An area-overlap measure was used to find the correspondence between the detected entities and the ground-truth. For a total of 827, 433 ground truth words, the algorithm identified and segmented 800, 149 words correctly, an accuracy of 97.43%

Keywords :

character recognition; document image processing; feature extraction; image segmentation; iterative methods; probability; statistical analysis; document images; feature extraction; iterative method; probability; statistical analysis; text word extraction algorithm; word segmentation; Computer science; Data mining; Image databases; Image segmentation; Page description languages; Partitioning algorithms; Software algorithms; Software engineering; Testing; Text analysis;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Pattern Recognition, 2000. Proceedings. 15th International Conference on

Conference_Location :

Barcelona

ISSN :

1051-4651

Print_ISBN :

0-7695-0750-6

Type :

conf

DOI :

10.1109/ICPR.2000.902980

Filename :

902980

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2871045