Title :
Bangla text processing and recognition based on Fuzzy unsupervised Feature Extraction and SVM
Author :
Haque Monil, Mohammad Alaul ; Zulkar Nine, Md S. Q. ; Poon, Bruce ; Amini, M. Ashraful ; Hong Yan
Author_Institution :
Comput. Vision & Cybern. Res. Group, Comput. Sci. & Eng., Indep. Univ., Dhaka, Bangladesh
Abstract :
Optical character recognition (OCR) is a widely used technology to convert text images to editable text. Researchers already proposed many machine learning algorithms to address this problem. However, Bangla text recognition is highly challenging job for its complicated writing style, compound characters and highly diversified fonts. To address the segmentation problem we have proposed an algorithm namely Blob-Labeled character Segmentation (BLCS) that initiates with an extensive preprocessing to extract the characters from text. Our novel character segmentation procedure extracts characters maintaining 97.5% accuracy. Unsupervised feature learning becomes a powerful tool in machine learning nowadays. To increase the recognition rate of the characters, we have introduced a fuzzy unsupervised feature learning algorithm to learn features of individual characters. We then use Artificial Neural Network (ANN) and Support Vector Machine (SVM) to classify the characters. The SVM provides 99.4% accuracy which outperforms all other approaches.
Keywords :
fuzzy set theory; image segmentation; learning (artificial intelligence); natural language processing; neural nets; optical character recognition; text analysis; ANN; BLCS; Bangia text recognition; Bangla text processing; OCR; SVM; artificial neural network; blob-labeled character segmentation; editable text; fuzzy unsupervised feature extraction; machine learning algorithms; optical character recognition; segmentation problem; support vector machine; text images; unsupervised feature learning; Abstracts; Artificial neural networks; Histograms; Image segmentation; Optical imaging; Support vector machines; Three-dimensional displays; Artificial neural network (ANN); Optical character recognition (OCR); Support vector machine (SVM); Type-I fuzzy system; Unsupervised feature learning;
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2013 International Conference on
Conference_Location :
Tianjin
DOI :
10.1109/ICMLC.2013.6890784