• DocumentCode
    457398
  • Title

    Summarization of JBIG2 Compressed Indian Language Textual Images

  • Author

    Garain, Utpal ; Datta, Alok K. ; Bhattacharya, U. ; Parui, S.K.

  • Author_Institution
    Indian Stat. Inst., Kolkata
  • Volume
    3
  • fYear
    0
  • fDate
    0-0 0
  • Firstpage
    344
  • Lastpage
    347
  • Abstract
    This paper presents a method for automatic summarization of JBIG2 coded textual images without optical character recognition (OCR). Compressed images are partially (less than 10% of the uncompressed image size) decompressed and text lines and words are marked. A few features are computed at each sentence level. Based on the feature values sentences are then marked as a summary sentence or not. The system finally generates a set of sentences as summary. In addition, sentences are ranked within the summary. Experiment considers Indian language text images. Test results show a sentence selection efficiency of about 56% when judged against summarization generated by human. A nonparametric (distribution-free) rank statistic shows a correlation coefficient of 0.28 as a measure of the (minimum) strength of the associations between sentence ranking by machine and human
  • Keywords
    data compression; document image processing; image coding; natural languages; Indian language textual image summarization; JBIG2 compressed textual image; nonparametric distribution-free rank statistic; Character recognition; Humans; Image coding; Image retrieval; Information retrieval; Libraries; Optical character recognition software; Prototypes; Statistical distributions; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 2006. ICPR 2006. 18th International Conference on
  • Conference_Location
    Hong Kong
  • ISSN
    1051-4651
  • Print_ISBN
    0-7695-2521-0
  • Type

    conf

  • DOI
    10.1109/ICPR.2006.1090
  • Filename
    1699536