شماره ركورد كنفرانس
144
عنوان مقاله
Clustering Low Quality Farsi Sub-words For Word Recognition
پديدآورندگان
ArabYarmohammadi Hamed نويسنده , AhmadyFard Alireza نويسنده , Khosravi Hossein نويسنده
تعداد صفحه
5
كليدواژه
Persian Typing Sub-Words , Clustering , hierarchical , K-meams , Low resolution , Local binary patterns , Zoning
عنوان كنفرانس
مجموعه مقالات دوازدهمين كنفرانس سيستم هاي هوشمند ايران
زبان مدرك
فارسی
چكيده فارسي
OCR of low resolution documents is not so common,
because it has a lot of problems. However, today there are several
archives of digital documents which are scanned at low
resolution, to consume less storage. These documents which
usually have a resolution of 100 to 150 dpi, require to be
converted to searchable documents. In this paper presents a new
method for clustering of low quality printed Persian sub-words.
This is necessary to reduce the number of classes of sub-words in
order to improve the overall recognition rate. Two popular
clustering methods, hierarchical and k-means implemented and
compared. Local binary patterns (LBP) and zoning algorithms
used for feature extraction. Both features are fast and represent
the global shape information very well. Moreover, we used
different distance measures to find the similarity of feature
vectors. We applied our algorithms on a dataset of 10,700 images
of distinct Persian sub-words with 96 dpi resolution.
Experimental results show that the hierarchical clustering with
the correlation distance measure has the best performance over
other clustering methods and distance measures.
شماره مدرك كنفرانس
3817034
سال انتشار
2014
از صفحه
1
تا صفحه
5
سال انتشار
0
لينک به اين مدرک