Title :
Research of PU Text Semi-supervised Classification Based on Ontology Feature Extraction
Author :
Luo, Na ; Yuan, Fuyu ; Zuo, Wanli ; He, Fengling
Author_Institution :
Coll. of Comput. Sci. & Technol., Jilin Univ., Changchun
Abstract :
For the shortcomings in the method of traditional statistics-based feature extraction on PU issues, we put forward feature extraction based on ontology to improve the performance of PU classification. We improved PEBL algorithm, and get the document vector of positive set using ontology-based feature extraction, then find the strong positive features, which include the crossing semantics in the positive documents and have higher frequency in positive set. The improved algorithm scans the documents twice. First, we get the semantic of the documents by ontology. Second, we filtrate the terms which include none of these semantic to reduce the dimension and obtain the document vector. Experiments had shown that the improved PEBL classifier increases the F1 score by 0.7389%.
Keywords :
classification; feature extraction; learning (artificial intelligence); ontologies (artificial intelligence); text analysis; PEBL algorithm; dimension reduction; document semantics; document vector; ontology feature extraction; positive-unlabeled semisupervised text classification; Application software; Chemical technology; Computer science; Educational institutions; Feature extraction; Frequency; Laboratories; Machine learning; Ontologies; Text categorization; F Score; Ontology; PU; Semi-supervised;
Conference_Titel :
Machine Learning and Applications, 2008. ICMLA '08. Seventh International Conference on
Conference_Location :
San Diego, CA
Print_ISBN :
978-0-7695-3495-4
DOI :
10.1109/ICMLA.2008.19