DocumentCode
1990518
Title
Language identification in historical Afghan manuscripts
Author
Farooq, Faisal ; Govindaraju, Venu
Author_Institution
CEDAR, State Univ. of New York, Amherst, NY
fYear
2007
fDate
12-15 Feb. 2007
Firstpage
1
Lastpage
4
Abstract
Automatic language identification is an important step prior to optical character recognition (OCR). In this paper we present a system to discriminate between Arabic and Persian in historical Afghan manuscripts. The classification is performed at a sub-sentence level. We propose a feature extraction algorithm for a sub-sentence based on Gabor filters followed by classification using a support vector machine (SVM). An overall precision of 96.72% and 94.90% is obtained for Persian and Arabic respectively.
Keywords
Gabor filters; feature extraction; history; image classification; natural language processing; optical character recognition; support vector machines; Gabor filters; Persian; automatic language identification; feature extraction algorithm; historical Afghan manuscripts; optical character recognition; sub-sentence classification; support vector machine; Character recognition; Feature extraction; Gabor filters; Optical character recognition software; Optical filters; Support vector machine classification; Support vector machines;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal Processing and Its Applications, 2007. ISSPA 2007. 9th International Symposium on
Conference_Location
Sharjah
Print_ISBN
978-1-4244-0778-1
Electronic_ISBN
978-1-4244-1779-8
Type
conf
DOI
10.1109/ISSPA.2007.4555588
Filename
4555588
Link To Document