DocumentCode
3490028
Title
Invariants Extraction Method Applied in an Omni-language Old Document Navigating System
Author
Quang Anh Bui ; Visani, Muriel ; Mullot, Remy
Author_Institution
Lab. L3i, Univ. of La Rochelle, La Rochelle, France
fYear
2013
fDate
25-28 Aug. 2013
Firstpage
1325
Lastpage
1329
Abstract
We are currently working on the concept of an omni script and interactive word retrieval system for ancient document collection navigation, based on query composition for non-expert users. To make the query, the user selects and composes writing pieces, which are invariants automatically extracted from the old document collection. In order to extract invariants from documents, strokes must be first extracted and clustered. Stroke extraction raises two main difficulties: detecting the ambiguous zones so as to extract primary strokes (writing pieces which do not contain any ambiguous zone) and grouping the primary strokes so as to form invariants. In this paper, we present existing methods for ambiguity zones detection and compare these methods on documents of different languages and periods to find out which one is more adapted in our context. Once ambiguous zones have been extracted, some neighboring primary strokes are grouped so as to obtain strokes and our clustering algorithm is applied over these strokes to find their representatives, i.e. the invariants. These invariants can further be used by the user to compose his/her query and to retrieve words from the document collection.
Keywords
document handling; history; pattern clustering; query processing; ambiguity zones detection; ancient document collection navigation; clustering algorithm; interactive word retrieval system; invariants extraction method; omni-language old document navigation system; omni-script language; primary strokes; query composition; stroke extraction; user composition; user selection; Algorithm design and analysis; Clustering algorithms; Databases; Feature extraction; Shape; Visualization; Writing; Ambiguous Zones Detection; Clustering; Invariant Extraction; Stroke Extraction; Word Retrieval;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location
Washington, DC
ISSN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2013.268
Filename
6628829
Link To Document