DocumentCode
2992887
Title
HIMA: A Holistic Data Instance Matching Approach
Author
Miao, Jiajia ; Chen, Guoyou ; Li, Aiping ; Yan, Jia ; Jiang, Siyu
Author_Institution
Inst. of Command Autom., PLA Univ. of Sci. & Technol., Nanjing, China
fYear
2010
fDate
25-27 June 2010
Firstpage
5242
Lastpage
5245
Abstract
Considering the consistency of instance level, we come up with a Holistic Data Instance Matching Approach (HIMA). Firstly, we measure the similarity of instances with the algorithm of string distances. HIMA makes use of the clustering algorithm, which it can handle, a large scale of data source holistically. In addition, we use the keyword extracting method, which is based on the maximum entropy model, to get rid of the useless information. The experimental results show that the keyword extracting algorithm can get 70% precision, and the condition probabilistic based algorithm is more precise than the token-based algorithm. HIMA method can achieve 83% accuracy.
Keywords
data analysis; entropy; information retrieval; pattern clustering; string matching; HIMA; clustering algorithm; holistic data instance matching; keyword extracting method; maximum entropy model; string distance; Clustering algorithms; Computational modeling; Computers; Couplings; Data mining; Entropy; Programmable logic arrays; clustering; instance matching; maximum entropy model; string distance;
fLanguage
English
Publisher
ieee
Conference_Titel
Electrical and Control Engineering (ICECE), 2010 International Conference on
Conference_Location
Wuhan
Print_ISBN
978-1-4244-6880-5
Type
conf
DOI
10.1109/iCECE.2010.1272
Filename
5630513
Link To Document