• DocumentCode
    3037867
  • Title

    Language Identification from an Indian Multilingual Document Using Profile Features

  • Author

    Padma, M.C. ; Vijaya, P.A. ; Nagabhushan, P.

  • Author_Institution
    Dept. of CS & Eng., PES Coll. of Eng., Mandya
  • fYear
    2009
  • fDate
    8-10 March 2009
  • Firstpage
    332
  • Lastpage
    335
  • Abstract
    In order to reach a larger cross section of people, it is necessary that a document should be composed of text contents in different languages. But on the other hand, this causes practical difficulty in OCRing such a document, because the language type of the text should be pre-determined, before employing a particular OCR. In this research work, this problem of recognizing the language of the text content is addressed, however it is perhaps impossible to design a single recognizer which can identify a large number of scripts/languages. As a via media, in this research we have proposed to work on the prioritized requirements of a particular region, for instance in Karnataka state in India,generally any document including official ones, would contain the text in three languages-English-the language of general importance, Hindi-the language of National importance and Kannada -the language of State/Regional importance. We have proposed to learn identifying the language of the text by thoroughly understanding the nature of top and bottom profiles of the printed text lines in these three languages.Experimentation conducted involved 800 text lines for learning and 600 text lines for testing. The performance has turned out to be 95.4%.
  • Keywords
    document handling; natural language processing; text analysis; English; Hindi; Indian multilingual document; Kannada; OCRing; language identification; text content; Automation; Books; Document image processing; Educational institutions; Feature extraction; Natural languages; Optical character recognition software; Testing; Text analysis; Text recognition; Bottom Profile; Document Image Processing; Feature extraction.; Language Identification; Multi-lingual document; Top Profile;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Automation Engineering, 2009. ICCAE '09. International Conference on
  • Conference_Location
    Bangkok
  • Print_ISBN
    978-0-7695-3569-2
  • Type

    conf

  • DOI
    10.1109/ICCAE.2009.35
  • Filename
    4804543