Title :
Semi-automatic information retrieval and consolidation with a sample application
Author :
Korica-Pehserl, Petra ; Maurer, Hermann
Author_Institution :
Inst. for Inf. Syst. & Comput. Media, Graz Univ. of Technol., Graz, Austria
Abstract :
Imagine for a moment a Web where we can extract information from any website, know its context, and automatically assemble it with other information from other sources like databases, geo-maps, multimedia files etc. into a homogeneous document. Today the Web is populated by unstructured or semi-structured data, and the difficulty to consolidate it automatically makes this idea wishful thinking at the moment. This paper describes an attempt to use semi-automatic information retrieval and consolidation on the largest Austrian online encyclopedia Austria-Forum. We had access to the database of historic images of Austria with more than 40.000 images. We describe how we managed to incorporate some of those images suitable for the Austria-Forum without duplicates. The process comprised a large set of heuristics that accomplished a high percentage of the integration automatically. The results required only a moderate effort for a human expert to check if the images did indeed fit the entry proposed by the system.
Keywords :
Web sites; data structures; document handling; encyclopaedias; image retrieval; visual databases; Austria-Forum; Austrian online encyclopedia; Web site; geo-maps; historic image database; homogeneous document; information extraction; multimedia files; semi automatic information consolidation; semi automatic information retrieval; semi structured data; unstructured data; Context; Data mining; Humans; Image databases; Information retrieval; Presses; Data Consolidation; Data Retrieval; Information Integration; Multimedia; Web;
Conference_Titel :
Emerging Technologies (ICET), 2012 International Conference on
Conference_Location :
Islamabad
Print_ISBN :
978-1-4673-4452-4
DOI :
10.1109/ICET.2012.6375491