DocumentCode :
131203
Title :
Clustering low quality Farsi sub-words for word recognition
Author :
Yarmohammadi, Hamed Arab ; Fard, Alireza Ahmady ; Khosravi, Hossein
Author_Institution :
Fac. of Electr. & Robotic Eng., Shahrood Univ. of Technol., Shahrood, Iran
fYear :
2014
fDate :
4-6 Feb. 2014
Firstpage :
1
Lastpage :
5
Abstract :
OCR of low resolution documents is not so common, because it has a lot of problems. However, today there are several archives of digital documents which are scanned at low resolution, to consume less storage. These documents which usually have a resolution of 100 to 150 dpi, require to be converted to searchable documents. In this paper presents a new method for clustering of low quality printed Persian sub-words. This is necessary to reduce the number of classes of sub-words in order to improve the overall recognition rate. Two popular clustering methods, hierarchical and k-means implemented and compared. Local binary patterns (LBP) and zoning algorithms used for feature extraction. Both features are fast and represent the global shape information very well. Moreover, we used different distance measures to find the similarity of feature vectors. We applied our algorithms on a dataset of 10,700 images of distinct Persian sub-words with 96 dpi resolution. Experimental results show that the hierarchical clustering with the correlation distance measure has the best performance over other clustering methods and distance measures.
Keywords :
document image processing; feature extraction; image resolution; optical character recognition; pattern clustering; LBP; OCR; correlation distance measure; digital documents; distance measures; feature extraction; feature vector similarity; global shape information; hierarchical clustering; k-means clustering; local binary patterns; low quality Farsi sub-word clustering; low quality printed Persian sub-word clustering; low resolution documents; word recognition; zoning algorithms; Clustering algorithms; Clustering methods; Correlation; Feature extraction; Image recognition; Image resolution; Vectors; Clustering; Hierarchical; K-meams; Local Binary Patterns; Low Resolution; Persian Typing Sub-Words; Zoning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Systems (ICIS), 2014 Iranian Conference on
Conference_Location :
Bam
Print_ISBN :
978-1-4799-3350-1
Type :
conf
DOI :
10.1109/IranianCIS.2014.6802518
Filename :
6802518
Link To Document :
بازگشت