Title :
A Weighted k-Nearest Neighborhood for BaseNP Detection under Covariate Shift
Author :
Son, Jeong-Woo ; Park, Seong-Bae ; Han, Young-Jin ; Park, Se-Young
Author_Institution :
Dept. of Comput. Eng., Kyungpook Nat. Univ., Daegu
Abstract :
In common machine learning methods, there is a basic assumption that training data and test data are sampled from the same distribution. However, this assumption is commonly violated in practical fields. The situation where the training and test data are generated from different distributions is so-called covariate shift. In natural language processing, it is highly possible to occur covariate shift due to the size of sample space. Natural language data have theoretically infinite size, which causes that the distribution of training data can not reflect that of entire data. In this paper, we try to verify that the performance of methods on natural language processing can be improved by reducing error from covariate shift. For this purpose, we propose the importance weighted k-NN for base noun detection. In the proposed method, the weights are set as a difference between the training and test distribution. Theoretically, the performance under covariate shift can be improved using importance weight method. In the experiment, the proposed method shows better performance than normal k-NN.
Keywords :
learning (artificial intelligence); natural language processing; pattern clustering; base noun detection; baseNP detection; covariate shift; machine learning; natural language data; natural language processing; weighted k-nearest neighborhood; Information technology; Covariate Shift; Machine Learning; NLP; Weighted kNN;
Conference_Titel :
Advanced Language Processing and Web Information Technology, 2008. ALPIT '08. International Conference on
Conference_Location :
Dalian Liaoning
Print_ISBN :
978-0-7695-3273-8
DOI :
10.1109/ALPIT.2008.78