DocumentCode :
3614661
Title :
SumatraTT: a generic data pre-processing system
Author :
P. Aubrecht;P. Miksovsky;L. Kral
Author_Institution :
Dept. of Cybern., Czech Tech. Univ., Prague, Czech Republic
fYear :
2003
fDate :
6/25/1905 12:00:00 AM
Firstpage :
120
Lastpage :
124
Abstract :
A systematic process of indexing cultural heritage artefacts began well before the era of computers. The first step of digitising such archives of hand- and typewriter-written data was naturally focused on transfer of these files into a digital form - either by means of re-typing the original data manually or by applying OCR methods on scanned documents. As a result, there exist huge digital archives of data and metadata in Europe, which describes millions of artefacts kept by thousands of galleries, museums, and/or private collections. To explore such archives (inc. data mining methods), the data need to be converted into a unified format and data model. Moreover, the original indexing methodologies may also vary significantly. Thus, even conversion to a unified metadata (ontology) model is needed. Any data transformation is a tedious task, which usually requires designing, implementing, and testing number of scripts, which will be executed in order to transform the data sets. To simplify such data transformation processes, a generic data transformation system called SumatraTT has been developed at the Gerstner laboratory of the Czech Technical University in Prague. The system has been verified on a number of applications, mostly as a data pre-processing system in the process of data mining. Currently, the goals of the CIPHER project opened new research directions aimed at investigating the ontology transformation and unification problems using SumatraTT.
Keywords :
"Data processing","Databases","Cultural differences","Data mining","Ontologies","Indexing","Laboratories","Cybernetics","Optical character recognition software","Europe"
Publisher :
ieee
Conference_Titel :
Database and Expert Systems Applications, 2003. Proceedings. 14th International Workshop on
ISSN :
1529-4188
Print_ISBN :
0-7695-1993-8
Type :
conf
DOI :
10.1109/DEXA.2003.1232010
Filename :
1232010
Link To Document :
بازگشت