• DocumentCode
    3490028
  • Title

    Invariants Extraction Method Applied in an Omni-language Old Document Navigating System

  • Author

    Quang Anh Bui ; Visani, Muriel ; Mullot, Remy

  • Author_Institution
    Lab. L3i, Univ. of La Rochelle, La Rochelle, France
  • fYear
    2013
  • fDate
    25-28 Aug. 2013
  • Firstpage
    1325
  • Lastpage
    1329
  • Abstract
    We are currently working on the concept of an omni script and interactive word retrieval system for ancient document collection navigation, based on query composition for non-expert users. To make the query, the user selects and composes writing pieces, which are invariants automatically extracted from the old document collection. In order to extract invariants from documents, strokes must be first extracted and clustered. Stroke extraction raises two main difficulties: detecting the ambiguous zones so as to extract primary strokes (writing pieces which do not contain any ambiguous zone) and grouping the primary strokes so as to form invariants. In this paper, we present existing methods for ambiguity zones detection and compare these methods on documents of different languages and periods to find out which one is more adapted in our context. Once ambiguous zones have been extracted, some neighboring primary strokes are grouped so as to obtain strokes and our clustering algorithm is applied over these strokes to find their representatives, i.e. the invariants. These invariants can further be used by the user to compose his/her query and to retrieve words from the document collection.
  • Keywords
    document handling; history; pattern clustering; query processing; ambiguity zones detection; ancient document collection navigation; clustering algorithm; interactive word retrieval system; invariants extraction method; omni-language old document navigation system; omni-script language; primary strokes; query composition; stroke extraction; user composition; user selection; Algorithm design and analysis; Clustering algorithms; Databases; Feature extraction; Shape; Visualization; Writing; Ambiguous Zones Detection; Clustering; Invariant Extraction; Stroke Extraction; Word Retrieval;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2013.268
  • Filename
    6628829