Title :
Data modeling in machine learning based on information-theoretic measures
Author :
Liu, Yun-Hui ; Li, Ai-jun ; Luo, Si-Wei
Author_Institution :
Dept. of Comput. Sci., Northern Jiaotong Univ., Beijing, China
Abstract :
Data modeling is a key problem in machine learning. In conventional machine learning, a lot of research has been focused on a specific method for a specific environment in which models are selected and built generally by using ad hoc methods, "trial and error" or solely on "expert" knowledge or intuition. As a result, the effectiveness of the models is limited and the research results often do not contribute to the fundamental understanding of the field nor lend themselves to the broader problem domain. The paper aims to provide theoretical foundations as well as useful tools to guide model building and to explain and evaluate model performance by using several information-theoretic measures, namely, entropy, conditional entropy, relative entropy, information gain, and information cost. These measures can characterize the regularity of data set and thus contribute to the data modeling.
Keywords :
data models; entropy; learning (artificial intelligence); conditional entropy; data modeling; information cost; information gain; information-theoretic measures; machine learning; model building; relative entropy; Computer science; Costs; Data analysis; Data models; Entropy; Gain measurement; Impurities; Machine learning; Partial response channels; Performance gain;
Conference_Titel :
Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on
Print_ISBN :
0-7803-7508-4
DOI :
10.1109/ICMLC.2002.1167394