Title :
A good all-around semi-supervised learning algorithm for information categorization
Author :
Liu, Lizhen ; Chen, Hai ; Du, Chao
Author_Institution :
Inf. Eng. Coll., CNU, Beijing, China
Abstract :
The paper reports a study on information categorizing based on high efficient feature selection and comprehensive semi-supervised learning algorithm. Feature selections or conversions are performed using maximum mutual information including linear and non-linear feature conversions. Entropy is made use of and extended to find right features commendably with machine learning method. Fuzzy partition clustering method is presented and used to obtain a few labeled samples and some external clusters automatically by measuring the similarity of clustering correlation documents. So categorization bases are found for supervised learning. Furthermore, naive Bayes augment learning is combined to design and learn categorizers. And the approach of estimating the loss of classifying error facilitates to balance the selection of candidates. The all-around learning algorithm can greatly improve the precision and efficiency of Web information categorization.
Keywords :
Bayes methods; Internet; classification; fuzzy set theory; learning (artificial intelligence); Web information categorization; entropy; feature selection; fuzzy partition clustering; machine learning; maximum mutual information; naive Bayes augment learning; semisupervised learning algorithm; Accuracy; Chaos; Clustering algorithms; Clustering methods; Information analysis; Machine learning algorithms; Mutual information; Partitioning algorithms; Semisupervised learning; Space technology; component; dimensionality reduction; fuzzy clustering; web information categorization;
Conference_Titel :
Intelligent Computing and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-4754-1
Electronic_ISBN :
978-1-4244-4738-1
DOI :
10.1109/ICICISYS.2009.5357843