Title :
A Scriptable, Statistical Oracle for a Metadata Extraction System
Author :
Maly, Kurt J. ; Zeil, Steven J. ; Zubair, Mohammad ; Amrou, Ashraf ; Aazhar, Ali ; Ratkal, Naveen
Author_Institution :
Old Dominion Univ., Norfolk
Abstract :
An oracle is described for dynamic validation of an application (metadata extraction from scanned documents) where a moderate failure rate is acceptable provided that instances of failures during operation can be identified. The oracle combines a variety of deterministic tests and statistical tests based upon characteristics of the document collection on which the system operates. Because this system must adapt to a variety of document collections with different characteristics, a scripting language is developed that binds combinations of tests to the metadata fields expected in a given document collection. The suitability of the oracle is demonstrated by an experiment measuring its ability to mimic human judgments as to which of several alternate outputs for the same document would be preferred.
Keywords :
authoring languages; meta data; document collection; metadata extraction system; moderate failure rate; scripting language; statistical oracle; Application software; Computer errors; Computer science; Data mining; Engines; Error correction; Humans; Optical character recognition software; System testing; XML;
Conference_Titel :
Quality Software, 2007. QSIC '07. Seventh International Conference on
Conference_Location :
Portland, OR
Print_ISBN :
978-0-7695-3035-2
DOI :
10.1109/QSIC.2007.4385526