Title :
LBP Based Line-Wise Script Identification
Author :
Ferrer, Miguel A. ; Morales, Aythami ; Pal, Umapada
Author_Institution :
Inst. Univ. para el Desarrollo Tecnol. y la Innovacion en Comun. (IDeTIC), Univ. de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
Abstract :
Script identification is an important step in multi-script document analysis. As different textures present in text portion of a script are the main distinct features of the script, in this paper, we proposed a new algorithm for printed script identification based on texture analysis. Since local patterns is a unifying concept for traditional statistical and structural approaches of texture analysis, here the basic idea is to use the histogram of the local patterns as description of the script stroke directions distribution which is the characteristic of every script. As local pattern, the basic version of the Local Binary Patterns (LBP) and a modified version of the Orientation of the Local Binary Patterns (OLBP) are proposed. A Least Square Support Vector Machine (LS-SVM) is used as identifier. The scheme has been verified on two databases. The first or training database is a database with 200 sheets of 10 different scripts. The scripts font is provided by the Google translator. The second or test database has been obtained by scanning different newspapers and books. It contains 5 common scripts among 10 different scripts of the first database. From the experiment we obtained encouraging results.
Keywords :
document image processing; image texture; least squares approximations; optical character recognition; support vector machines; text detection; visual databases; Google translator; LBP-based line wise script identification; LS-SVM; OCR; identifier; least square support vector machine; local binary patterns; local pattern histogram; multiscript document analysis; printed script identification; script features; script stroke direction distribution; text portion; texture analysis; training database; Databases; Feature extraction; Histograms; Image segmentation; Support vector machines; Testing; Training; Document Analysis; LBP; Multi-script OCR; Script Identification; Texture Measures;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/ICDAR.2013.81