DocumentCode
2278774
Title
A language independent text segmentation technique based on naive bayes classifier
Author
Bidgoli, A.M. ; Boraghi, M.
Author_Institution
North Tehran Branch, Islamic Azad Univ., Tehran, Iran
fYear
2010
fDate
15-17 Dec. 2010
Firstpage
11
Lastpage
16
Abstract
One of the important stages for optical character recognition system is text components segmentation from non-text components of input images. In this paper a machine learning technique based on a naive bayes classifier is developed for text components segmentation. In training stage, a simple procedure is used to generate a large collection of training data sets for learning the classifier. A collection of manuscript and printed Persian and English pictorial Images that have been manually separated, have been used for training. A proper post-processing is applied to improve the segmentation results. Several representative document images scanned from Persian, English and Chinese handwritings and printed documents are employed to verify the effectiveness of the developed algorithm.
Keywords
Bayes methods; character recognition; document image processing; image segmentation; learning (artificial intelligence); Chinese handwritings; English pictorial Images; Persian pictorial Images; document images; language independent text segmentation technique; machine learning technique; naive bayes classifier; nontext components; optical character recognition system; text components segmentation; training data sets; Classification algorithms; Equations; Image edge detection; Image segmentation; Mathematical model; Training; Training data; Documents Image Analyses; Naive Bayes Classifier; OCR; Text Segmentation;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal and Image Processing (ICSIP), 2010 International Conference on
Conference_Location
Chennai
Print_ISBN
978-1-4244-8595-6
Type
conf
DOI
10.1109/ICSIP.2010.5697433
Filename
5697433
Link To Document