DocumentCode :
3695181
Title :
Automatic and interactive rule inference without ground truth
Author :
Cérès Carton;Aurélie Lemaitre;Bertrand Coüasnon
Author_Institution :
IRISA - INSA, Université
fYear :
2015
Firstpage :
696
Lastpage :
700
Abstract :
Dealing with non annotated documents for the design of a document recognition system is not an easy task. In general, statistical methods cannot learn without an annotated ground truth, unlike syntactical methods. However their ability to deal with non annotated data comes from the fact that the description is manually made by a user. The adaptation to a new kind of document is then tedious as the whole manual process of extraction of knowledge has to be redone. In this paper, we propose a method to extract knowledge and generate rules without any ground truth. Using large volume of non annotated documents, it is possible to study redundancies of some extracted elements in the document images. The redundancy is exploited through an automatic clustering algorithm. An interaction with the user brings semantic to the detected clusters. In this work, the extracted elements are some keywords extracted with word spotting. This approach has been applied to old marriage record field detection on the FamilySearch HIP2013 competition database. The results demonstrate that we successfully automatically infer rules from non annotated documents using the redundancy of extracted elements of the documents.
Keywords :
"Reliability","Learning automata","Niobium","Atmospheric modeling","Manuals"
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
Type :
conf
DOI :
10.1109/ICDAR.2015.7333851
Filename :
7333851
Link To Document :
بازگشت