• DocumentCode
    2489554
  • Title

    Word-wise Sinhala Tamil and English script identification using Gaussian kernel SVM

  • Author

    Chanda, Sukalpa ; Pal, Srikanta ; Pal, Umapada

  • Author_Institution
    Indian Stat. Inst., Kolkata, India
  • fYear
    2008
  • fDate
    8-11 Dec. 2008
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    There are many documents in Srilanka where a single document page may contain Sinhala, Tamil and English texts. For OCR development of such a document page, it is better to identify different scripts present in the page and then feed the identified portion to the respective OCR module. In this paper, a SVM based technique is proposed for word-wise identification of Sinhala, Tamil and English scripts from a single document page. Structural features, topological features and water reservoir principle based features are mainly used here for the purpose. From the experiment we obtained encouraging results.
  • Keywords
    Gaussian processes; document image processing; feature extraction; optical character recognition; support vector machines; text analysis; Gaussian kernel SVM; OCR module; document image processing; structural feature; topological feature; water reservoir principle-based feature; word-wise English script identification; word-wise Sinhala Tamil script identification; word-wise Sinhala script identification; Feeds; Kernel; Neural networks; Optical character recognition software; Reservoirs; Structural shapes; Support vector machine classification; Support vector machines; Water resources; Water storage;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 2008. ICPR 2008. 19th International Conference on
  • Conference_Location
    Tampa, FL
  • ISSN
    1051-4651
  • Print_ISBN
    978-1-4244-2174-9
  • Electronic_ISBN
    1051-4651
  • Type

    conf

  • DOI
    10.1109/ICPR.2008.4761823
  • Filename
    4761823