DocumentCode
3078452
Title
Visual integration tool for heterogeneous data type by unified vectorization
Author
Bourennani, Farid ; Pu, Ken Q. ; Zhu, Ying
Author_Institution
Inst. of Technol., Univ. of Ontario, Oshawa, ON, Canada
fYear
2009
fDate
10-12 Aug. 2009
Firstpage
132
Lastpage
137
Abstract
Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. One of the critical issues of data integration is the detection of similar entities based on the content. This complexity is due to three factors: the data type of the databases are heterogeneous, the schema of databases are unfamiliar and heterogenous as well, and the amount of records is voluminous and time consuming to analyze. As solution to these problems we extend our work in another of our papers by introducing a new measure to handle heterogeneous textual and numerical data type for co-incident meaning extraction. Firstly, to in order accommodate the heterogeneous data types we propose a new weight called Bin Frequency - Inverse Document Bin Frequency (BF-IDBF) for effective heterogeneous data pre-processing and classification by unified vectorization. Secondly in order to handle the unfamiliar data structure, we use the unsupervised algorithm Self-Organizing Map. Finally to help the user to explore and browse the semantically similar entities among the copious amount of data, we use a SOM based visualization tool to map the database tables based on their semantical content.
Keywords
data structures; data visualisation; distributed databases; pattern classification; self-organising feature maps; text analysis; unsupervised learning; bin frequency-inverse document bin frequency; co-incident meaning extraction; data classification; data integration; heterogeneous data pre-processing; heterogeneous data type; heterogeneous database; heterogeneous textual handling; numerical data type; self-organizing map; unfamiliar data structure handling; unified vectorization; unsupervised algorithm; visual integration tool; visualization tool; Data mining; Data structures; Data visualization; Data warehouses; Distributed databases; Frequency measurement; Hardware; History; Information retrieval; Visual databases; Data Integration; Information Retrieval (IR); Pre-Processing; SOM; Visual Data Mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Reuse & Integration, 2009. IRI '09. IEEE International Conference on
Conference_Location
Las Vegas, NV
Print_ISBN
978-1-4244-4114-3
Electronic_ISBN
978-1-4244-4116-7
Type
conf
DOI
10.1109/IRI.2009.5211539
Filename
5211539
Link To Document