• DocumentCode
    3028622
  • Title

    Language identification for printed text independent of segmentation

  • Author

    Wood, Sally L. ; Yao, Xiaozhong ; Krishnamurthi, Kanthimathi ; Dang, Laurence

  • Author_Institution
    Santa Clara Univ., CA, USA
  • Volume
    3
  • fYear
    1995
  • fDate
    23-26 Oct 1995
  • Firstpage
    428
  • Abstract
    This paper presents efficient algorithms for determining the language classification of machine generated documents without requiring the identification of individual characters. Such algorithms may be useful for sorting and routing of facsimile documents as they arrive so that appropriate routing and secondary analysis, which may include OCR, is selected for each document. It may also prove useful as a component of a content addressable document access system. There have been numerous reported efforts which attempt to segment printed documents into homogeneous regions using Hough transforms, hidden Markov models, morphological filtering, and neural networks. However, language identification can be accomplished without explicit segmentation using the less computationally intensive methods described
  • Keywords
    Hough transforms; content-addressable storage; document image processing; facsimile; filtering theory; image segmentation; mathematical morphology; natural languages; optical character recognition; Hough transforms; content addressable document access system; facsimile documents; hidden Markov models; homogeneous regions; language classification; language identification; machine generated documents; morphological filtering; neural networks; printed documents; printed text; routing; secondary analysis; sorting; Algorithm design and analysis; Character generation; Facsimile; Filtering; Hidden Markov models; Independent component analysis; Neural networks; Optical character recognition software; Routing; Sorting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Image Processing, 1995. Proceedings., International Conference on
  • Conference_Location
    Washington, DC
  • Print_ISBN
    0-8186-7310-9
  • Type

    conf

  • DOI
    10.1109/ICIP.1995.537663
  • Filename
    537663