• DocumentCode
    1125794
  • Title

    Synthesizing Statistical Knowledge from Incomplete Mixed-Mode Data

  • Author

    Wong, Andrew K.C. ; Chiu, David K Y

  • Author_Institution
    Department of Systems Design Engineering, University of Waterloo, Waterloo, Ont. N2L 3G1, Canada.
  • Issue
    6
  • fYear
    1987
  • Firstpage
    796
  • Lastpage
    805
  • Abstract
    The difficulties in analyzing and clustering (synthesizing) multivariate data of the mixed type (discrete and continuous) are largely due to: 1) nonuniform scaling in different coordinates, 2) the lack of order in nominal data, and 3) the lack of a suitable similarity measure. This paper presents a new approach which bypasses these difficulties and can acquire statistical knowledge from incomplete mixed-mode data. The proposed method adopts an event-covering approach which covers a subset of statistically relevant outcomes in the outcome space of variable-pairs. And once the covered event patterns are acquired, subsequent analysis tasks such as probabilistic inference, cluster analysis, and detection of event patterns for each cluster based on the incomplete probability scheme can be performed. There are four phases in our method: 1) the discretization of the continuous components based on a maximum entropy criterion so that the data can be treated as n-tuples of discrete-valued features; 2) the estimation of the missing values using our newly developed inference procedure; 3) the initial formation of clusters by analyzing the nearest-neighbor distance on subsets of selected samples; and 4) the reclassification of the n-tuples into more reliable clusters based on the detected interdependence relationships. For performance evaluation, experiments have been conducted using both simulated and real life data.
  • Keywords
    Coordinate measuring machines; Data analysis; Decision making; Entropy; Event detection; Pattern analysis; Performance analysis; Phase estimation; Probability; Spatial databases; Cluster analysis; event-covering; incomplete probability scheme; mixed-mode data; probabilistic inference; statistical knowledge;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.1987.4767986
  • Filename
    4767986