DocumentCode :
2087567
Title :
Detecting data and schema changes in scientific documents
Author :
Adam, Nico ; Critchlow, T. ; Musick, R.
Author_Institution :
CIMIC, Rutgers Univ., Newark, NJ
fYear :
2000
fDate :
2000
Firstpage :
160
Lastpage :
170
Abstract :
Data stored in a data warehouse must be kept consistent and up-to-date with respect to the underlying information sources. By providing the capability to identify, categorize and detect changes in these sources, only the modified data needs to be transferred and entered into the warehouse. Another alternative, periodically reloading from scratch, is obviously inefficient. When the schema of an information source changes, all components that interact with, or make use of data originating from that source must be updated to conform. The change detection problem is the problem of detecting data and schema changes by comparing two versions of the same semi-structured document. We present an approach to detecting data and schema changes for scientific documents. Scientific data is of particular interest because it is normally stored as a semi-structured document, and suffers frequent schema updates. This paper demonstrates the use of graphs to represent scientific documents in particular and semi-structured documents in general as well as their schema. It also demonstrates an approach to efficiently detect data and schema changes by merging the detection with parsing the document
Keywords :
data warehouses; document handling; graphs; natural sciences computing; data change detection; data warehouse; graphs; information sources; parsing; schema change detection; schema updates; scientific documents; semi-structured document; Automation; Bioinformatics; Data mining; Data warehouses; Genomics; Laboratories; Merging; Warehousing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advances in Digital Libraries, 2000. Proceedings. IEEE
Conference_Location :
Washington, DC
Print_ISBN :
0-7695-0659-3
Type :
conf
DOI :
10.1109/ADL.2000.848379
Filename :
848379
Link To Document :
بازگشت