• DocumentCode
    316820
  • Title

    Document block identification using a neural network

  • Author

    Strouthopoulos, C. ; Papamarkos, N.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Democritus Univ. of Thrace, Xanthi, Greece
  • Volume
    2
  • fYear
    1997
  • fDate
    2-4 Jul 1997
  • Firstpage
    999
  • Abstract
    This paper describes a new method that clusters the content of a mixed type document in text or nontext areas. The proposed approach is based on a new set of textural features combined with a two stage neural network classifier. The neural network classifier consists of a principal components analyzer and a Kohonen self organized feature map. Document blocks are classified as text, graphics and halftones or to secondary subclasses corresponding to special cases of the primal classes. The proposed method can identify text regions included in graphics or even overlapped regions, that is, regions that cannot be separated with horizontal and vertical cuts. The performance of the method was extensively tested on a variety of documents with very promising results
  • Keywords
    document image processing; image segmentation; self-organising feature maps; Kohonen self organized feature map; PCA; clustering; document block identification; graphics; halftones; mixed type document; nontext areas; principal components analyzer; secondary subclasses; segmentation; text areas; textural features; two-stage neural network classifier; Automatic testing; Circuits; Coils; Databases; Graphics; Histograms; Laboratories; Layout; Neural networks; Robustness;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Signal Processing Proceedings, 1997. DSP 97., 1997 13th International Conference on
  • Conference_Location
    Santorini
  • Print_ISBN
    0-7803-4137-6
  • Type

    conf

  • DOI
    10.1109/ICDSP.1997.628532
  • Filename
    628532