• DocumentCode
    3179603
  • Title

    Quick induction of NNTrees for text categorization based on discriminative multiple centroid approach

  • Author

    Hayashi, Hirotomo ; Zhao, Qiangfu

  • Author_Institution
    Dept. of Comput. & Inf. Syst., Univ. of Aizu, Aizu-Wakamatsu, Japan
  • fYear
    2010
  • fDate
    10-13 Oct. 2010
  • Firstpage
    705
  • Lastpage
    712
  • Abstract
    Neural network tree (NNTree) is a hybrid model for machine learning. So far, we have proposed an efficient algorithm for inducing NNTrees, and verified through experiments that NNTrees are efficient and effective for solving different pattern recognition problems. However, for problems like text categorization, induction of NNTrees can be very computationally expensive. To solve this problem, we have tried to induce NNTrees after dimensionality reduction. Specifically, we have studied the linear discriminant analysis (LDA) based approach, the principal component analysis (PCA) based approach, and the direct centroid (DC) based approach. Results show that DC is simple but not effective; and LDA performs better but the computational cost for finding the transformation matrix is very high. To solve the problem more efficiently, we propose in this paper the discriminant multiple centroid (DMC) approach. Actually, DMC is a two-stage approach, in which all data are first mapped to a lower dimensional space based on the centroids, and the LDA is then conducted in the mapped space. Experimental results obtained for three public text datasets show that in all cases DMC is much faster than LDA without significant degradation.
  • Keywords
    decision trees; learning (artificial intelligence); matrix algebra; neural nets; pattern classification; principal component analysis; text analysis; NNTree; computational cost; direct centroid based approach; discriminative multiple centroid approach; hybrid learning model; linear discriminant analysis; machine learning; neural network tree; pattern recognition; principal component analysis; public text dataset; quick induction; text categorization; transformation matrix; Artificial neural networks; Pattern recognition; decision tree; dimensionality reduction; neural network; text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems Man and Cybernetics (SMC), 2010 IEEE International Conference on
  • Conference_Location
    Istanbul
  • ISSN
    1062-922X
  • Print_ISBN
    978-1-4244-6586-6
  • Type

    conf

  • DOI
    10.1109/ICSMC.2010.5641834
  • Filename
    5641834