• DocumentCode
    2992887
  • Title

    HIMA: A Holistic Data Instance Matching Approach

  • Author

    Miao, Jiajia ; Chen, Guoyou ; Li, Aiping ; Yan, Jia ; Jiang, Siyu

  • Author_Institution
    Inst. of Command Autom., PLA Univ. of Sci. & Technol., Nanjing, China
  • fYear
    2010
  • fDate
    25-27 June 2010
  • Firstpage
    5242
  • Lastpage
    5245
  • Abstract
    Considering the consistency of instance level, we come up with a Holistic Data Instance Matching Approach (HIMA). Firstly, we measure the similarity of instances with the algorithm of string distances. HIMA makes use of the clustering algorithm, which it can handle, a large scale of data source holistically. In addition, we use the keyword extracting method, which is based on the maximum entropy model, to get rid of the useless information. The experimental results show that the keyword extracting algorithm can get 70% precision, and the condition probabilistic based algorithm is more precise than the token-based algorithm. HIMA method can achieve 83% accuracy.
  • Keywords
    data analysis; entropy; information retrieval; pattern clustering; string matching; HIMA; clustering algorithm; holistic data instance matching; keyword extracting method; maximum entropy model; string distance; Clustering algorithms; Computational modeling; Computers; Couplings; Data mining; Entropy; Programmable logic arrays; clustering; instance matching; maximum entropy model; string distance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electrical and Control Engineering (ICECE), 2010 International Conference on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-4244-6880-5
  • Type

    conf

  • DOI
    10.1109/iCECE.2010.1272
  • Filename
    5630513