• DocumentCode
    3186966
  • Title

    A knowledge-based approach for textual information extraction from mixed text/graphics complex document images

  • Author

    Chen, Yen-Lin

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Taipei Univ. of Technol., Taipei, Taiwan
  • fYear
    2010
  • fDate
    10-13 Oct. 2010
  • Firstpage
    3270
  • Lastpage
    3277
  • Abstract
    A new knowledge-based technique for extracting and identifying text-lines from various real-life mixed text/graphics complex document images is presented in this paper. The proposed technique first decompose the document image into distinct object planes to separate homogeneous objects including textual regions of interest, non-text objects such as graphics and pictures, and background textures. Then a knowledge-based text extraction and identification method is performed on the resultant planes to obtain text-lines with different characteristics in each plane. This proposed system can offer high flexibility and expandability by just updating new rules for coping with more various types of real-life and future complex document images. From the experimental and comparative results, the proposed knowledge-based technique demonstrates its effectiveness and advantages on extracting text-lines with various illuminations, sizes, and font styles from various types of mixed text/graphics complex document images.
  • Keywords
    computer graphics; document handling; document image processing; information retrieval; knowledge based systems; text analysis; homogeneous objects; knowledge based approach; mixed text-graphics complex document image; nontext object; text lines identifyication; textual information extraction; textual region; Image segmentation; Document analysis; complex document images; knowledge-based systems; region segmentation; text extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems Man and Cybernetics (SMC), 2010 IEEE International Conference on
  • Conference_Location
    Istanbul
  • ISSN
    1062-922X
  • Print_ISBN
    978-1-4244-6586-6
  • Type

    conf

  • DOI
    10.1109/ICSMC.2010.5642309
  • Filename
    5642309