Title :
Integration of Data Warehouse and Unstructured Business Documents
Author :
Alqarni, Ahmad Abdullah ; Pardede, Eric
Author_Institution :
Dept. of Comput. Sci. & Comput. Eng., La Trobe Univ., Melbourne, VIC, Australia
Abstract :
The profusion of unstructured data forced organizations to manage and take advantage of such data especially in the decision making process. The feasibility of integrating or mapping unstructured data to a data warehouse is becoming significant to bridge this gap and take the full potential of these data. In this paper, we propose a multi-layer schema for mapping structured data stored in a data warehouse and unstructured data in business-related documents. The multi-layer schema facilitates the mapping between the two different data. Linguistically correlated data is identified using Word Net to enable the integration between both data sources. We also propose a generic XML schema for business-related unstructured documents to assist the mapping. The use Word Net to identify the matching result is promising in the absence of schema-instance and without the need to domain specific knowledge.
Keywords :
XML; data integration; data warehouses; decision making; WordNet; business-related unstructured documents; data sources; data warehouse integration; decision making process; generic XML schema; linguistic correlated data; multilayer schema; unstructured data forced organizations; unstructured data mapping; Data mining; Data models; Data warehouses; Organizations; Semantics; XML; XML schema matching; data integeration; data warehouse; schema mapping; unstructured document;
Conference_Titel :
Network-Based Information Systems (NBiS), 2012 15th International Conference on
Conference_Location :
Melbourne, VIC
Print_ISBN :
978-1-4673-2331-4
DOI :
10.1109/NBiS.2012.59