DocumentCode :
174892
Title :
On the Use of Reliable-Negatives Selection Strategies in the PU Learning Approach for Quality Flaws Prediction in Wikipedia
Author :
Ferretti, Edgardo ; Errecalde, Marcelo L. ; Anderka, Maik ; Stein, Bernardo
Author_Institution :
Dept. de Inf., Univ. Nac. de San Luis, San Luis, Argentina
fYear :
2014
fDate :
1-5 Sept. 2014
Firstpage :
211
Lastpage :
215
Abstract :
Learning from positive and unlabeled examples (PU learning) has proven to be an effective method in several Web mining applications. In particular, in the 1st International Competition on Quality Flaw Prediction in Wikipedia in 2012, a tailored PU learning approach performed best amongst the competitors. A key feature of that approach is the introduction of sampling strategies within the original PU learning procedure. The paper in hand revisits the winner approach of 2012 and elaborates on neglected aspects in order to provide evidence for the usefulness of sampling in PU learning. In this regard, we propose a modification to this PU learning approach, and we show how the different sampling strategies affect the flaw prediction effectiveness. Our analysis is based on the original evaluation corpus of the 2012-competition on quality flaw prediction. A main outcome is that under the best sampling strategy, our new modified version of PU learning increases in average the flaw prediction effectiveness by 18.31%, when compared against the winning approach of the competition.
Keywords :
Web sites; learning (artificial intelligence); sampling methods; PU learning approach; Web mining applications; Wikipedia; flaw prediction effectiveness; positive and unlabeled examples; quality flaws prediction; reliable-negative selection strategies; sampling strategies; Electronic publishing; Encyclopedias; Internet; Reliability; Support vector machines; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database and Expert Systems Applications (DEXA), 2014 25th International Workshop on
Conference_Location :
Munich
ISSN :
1529-4188
Print_ISBN :
978-1-4799-5721-7
Type :
conf
DOI :
10.1109/DEXA.2014.52
Filename :
6974851
Link To Document :
بازگشت