DocumentCode
2502649
Title
Binarization of Color Characters in Scene Images Using k-means Clustering and Support Vector Machines
Author
Kita, Kohei ; Wakahara, Toru
Author_Institution
Fac. of Comput. & Inf. Sci., Hosei Univ., Koganei, Japan
fYear
2010
fDate
23-26 Aug. 2010
Firstpage
3183
Lastpage
3186
Abstract
This paper proposes a new technique for binalizing multicolored characters subject to heavy degradations. The key ideas are threefold. The first is generation of tentatively binarized images via every dichotomization of k clusters obtained by k-means clustering in the HSI color space. The total number of tentatively binarized images equals 2k-2. The second is use of support vector machines (SVM) to determine whether and to what degree each tentatively binarized image represents a character or non-character. We feed the SVM with mesh and weighted direction code histogram features to output the degree of “character-likeness.” The third is selection of a single binarized image with the maximum degree of “character likeness” as an optimal binarization result. Experiments using a total of 1000 single-character color images extracted from the ICDAR 2003 robust OCR dataset show that the proposed method achieves a correct binarization rate of 93.7%.
Keywords
document image processing; image colour analysis; pattern clustering; support vector machines; text analysis; color characters binarization; k-means clustering; multicolored characters; scene images; support vector machines; Character recognition; Feature extraction; Histograms; Image color analysis; Pixel; Support vector machines; Training data; binarization of multicolored characters; figure-ground discrimination; k-means clustering; support vector machines;
fLanguage
English
Publisher
ieee
Conference_Titel
Pattern Recognition (ICPR), 2010 20th International Conference on
Conference_Location
Istanbul
ISSN
1051-4651
Print_ISBN
978-1-4244-7542-1
Type
conf
DOI
10.1109/ICPR.2010.779
Filename
5597180
Link To Document