• DocumentCode
    1273414
  • Title

    Data categorization using decision trellises

  • Author

    Frasconi, Paolo ; Gori, Marco ; Soda, Giovanni

  • Author_Institution
    Dept. of Syst. & Inf., Florence Univ., Italy
  • Volume
    11
  • Issue
    5
  • fYear
    1999
  • Firstpage
    697
  • Lastpage
    712
  • Abstract
    We introduce a probabilistic graphical model for supervised learning on databases with categorical attributes. The proposed belief network contains hidden variables that play a role similar to nodes in decision trees and each of their states either corresponds to a class label or to a single attribute test. As a major difference with respect to decision trees, the selection of the attribute to be tested is probabilistic. Thus, the model can be used to assess the probability that a tuple belongs to some class, given the predictive attributes. Unfolding the network along the hidden states dimension yields a trellis structure having a signal flow similar to second order connectionist networks. The network encodes context specific probabilistic independencies to reduce parametric complexity. We present a custom tailored inference algorithm and derive a learning procedure based on the expectation-maximization algorithm. We propose decision trellises as an alternative to decision trees in the context of tuple categorization in databases, which is an important step for building data mining systems. Preliminary experiments on standard machine learning databases are reported, comparing the classification accuracy of decision trellises and decision trees induced by C4.5. In particular, we show that the proposed model can offer significant advantages for sparse databases in which many predictive attributes are missing
  • Keywords
    belief networks; data mining; decision trees; deductive databases; inference mechanisms; learning (artificial intelligence); neural nets; optimisation; probability; belief network; categorical attributes; class label; classification accuracy; context specific probabilistic independencies; custom tailored inference algorithm; data categorization; data mining systems; decision trees; decision trellises; expectation-maximization algorithm; hidden states dimension; hidden variables; learning procedure; parametric complexity; predictive attributes; probabilistic graphical model; second order connectionist networks; signal flow; single attribute test; sparse databases; standard machine learning databases; supervised learning; trellis structure; tuple categorization; Data mining; Databases; Decision trees; Expectation-maximization algorithms; Graphical models; Inference algorithms; Machine learning algorithms; Predictive models; Supervised learning; Testing;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/69.806931
  • Filename
    806931