• DocumentCode
    177752
  • Title

    Multipage Administrative Document Stream Segmentation

  • Author

    Daher, Hani ; Bouguelia, Mohamed-Rafik ; Belaid, Abdel ; D´Andecy, Vincent Poulain

  • Author_Institution
    LORIA, Univ. de Lorraine, Vendoeuvre-L`es-Nancy, France
  • fYear
    2014
  • fDate
    24-28 Aug. 2014
  • Firstpage
    966
  • Lastpage
    971
  • Abstract
    We propose in this paper a framework for the segmentation and classification of document streams. The framework is composed of two modules: segmentation and verification. The two modules use an incremental classifier which learns progressively along the stream. In the segmentation module a relationship between two consecutive pages is classified as either: continuity or rupture. Rupture is synonymous of a clear break, thus probably a complete document. If the classifier is uncertain on whether the relationship should be a continuity or a rupture, an over-segmentation is proposed and we consider that we have a fragment i.e. portion of a document. Both fragments and documents are sent to the verification module where additionally to the incremental classifier it includes a correction module. The classifier predicts the classes of fragments and documents. The predicted class represents a context which is used as a query to search for similar contexts in the correction module and correct the segmentation and verification results. Corrections are sent back to the segmentation and verification modules to learn the correct classes. Results on real world databases show the effectiveness and stability of our approach.
  • Keywords
    document handling; pattern classification; correction module; document stream classification; incremental classifier; multipage administrative document stream segmentation; over-segmentation; rupture; segmentation module; verification module; Accuracy; Bar codes; Context; Databases; Feature extraction; Heuristic algorithms; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2014 22nd International Conference on
  • Conference_Location
    Stockholm
  • ISSN
    1051-4651
  • Type

    conf

  • DOI
    10.1109/ICPR.2014.176
  • Filename
    6976886