• DocumentCode
    1990518
  • Title

    Language identification in historical Afghan manuscripts

  • Author

    Farooq, Faisal ; Govindaraju, Venu

  • Author_Institution
    CEDAR, State Univ. of New York, Amherst, NY
  • fYear
    2007
  • fDate
    12-15 Feb. 2007
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Automatic language identification is an important step prior to optical character recognition (OCR). In this paper we present a system to discriminate between Arabic and Persian in historical Afghan manuscripts. The classification is performed at a sub-sentence level. We propose a feature extraction algorithm for a sub-sentence based on Gabor filters followed by classification using a support vector machine (SVM). An overall precision of 96.72% and 94.90% is obtained for Persian and Arabic respectively.
  • Keywords
    Gabor filters; feature extraction; history; image classification; natural language processing; optical character recognition; support vector machines; Gabor filters; Persian; automatic language identification; feature extraction algorithm; historical Afghan manuscripts; optical character recognition; sub-sentence classification; support vector machine; Character recognition; Feature extraction; Gabor filters; Optical character recognition software; Optical filters; Support vector machine classification; Support vector machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing and Its Applications, 2007. ISSPA 2007. 9th International Symposium on
  • Conference_Location
    Sharjah
  • Print_ISBN
    978-1-4244-0778-1
  • Electronic_ISBN
    978-1-4244-1779-8
  • Type

    conf

  • DOI
    10.1109/ISSPA.2007.4555588
  • Filename
    4555588