• DocumentCode
    2502649
  • Title

    Binarization of Color Characters in Scene Images Using k-means Clustering and Support Vector Machines

  • Author

    Kita, Kohei ; Wakahara, Toru

  • Author_Institution
    Fac. of Comput. & Inf. Sci., Hosei Univ., Koganei, Japan
  • fYear
    2010
  • fDate
    23-26 Aug. 2010
  • Firstpage
    3183
  • Lastpage
    3186
  • Abstract
    This paper proposes a new technique for binalizing multicolored characters subject to heavy degradations. The key ideas are threefold. The first is generation of tentatively binarized images via every dichotomization of k clusters obtained by k-means clustering in the HSI color space. The total number of tentatively binarized images equals 2k-2. The second is use of support vector machines (SVM) to determine whether and to what degree each tentatively binarized image represents a character or non-character. We feed the SVM with mesh and weighted direction code histogram features to output the degree of “character-likeness.” The third is selection of a single binarized image with the maximum degree of “character likeness” as an optimal binarization result. Experiments using a total of 1000 single-character color images extracted from the ICDAR 2003 robust OCR dataset show that the proposed method achieves a correct binarization rate of 93.7%.
  • Keywords
    document image processing; image colour analysis; pattern clustering; support vector machines; text analysis; color characters binarization; k-means clustering; multicolored characters; scene images; support vector machines; Character recognition; Feature extraction; Histograms; Image color analysis; Pixel; Support vector machines; Training data; binarization of multicolored characters; figure-ground discrimination; k-means clustering; support vector machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2010 20th International Conference on
  • Conference_Location
    Istanbul
  • ISSN
    1051-4651
  • Print_ISBN
    978-1-4244-7542-1
  • Type

    conf

  • DOI
    10.1109/ICPR.2010.779
  • Filename
    5597180