Title :
A Co-training Approach based TEF-WA technique
Author :
Huanling, Tang ; Mingyu, Lu ; Na, Liu
Author_Institution :
Dalian Maritime Univ., Dalian
Abstract :
Traditional categorization algorithm suffers from not having sufficient labeled training data for learning, while large amount unlabeled data are easily available. We investigate co-training algorithm and its assumption that the features set can be split into two compatible and independent views. However, the assumption is usually violated to some degree in practice and sometimes the natural feature split does not exist. So we adopt TEF_WA technique which utilizes term evaluation functions to split features set and construct multiple views. We can choose a pair of views which are compatible and independent to certain degree. Based TEF_WA technique we develop a semi-supervised categorization algorithm Co_CLM. Experimental results show Co_CLM can significantly decrease the classification error utilizing unlabeled data especially labeled data is sparse. Our experimental results also indicate Co_CLM will achieve more satisfactory performance with the more independent view pairs.
Keywords :
document handling; learning (artificial intelligence); Co_CLM; TEF_WA technique; Web documents categorization; co-training algorithm; semisupervised categorization algorithm; term evaluation functions; Constraint theory; Humans; Parallel processing; Semisupervised learning; Text categorization; Training data; Vocabulary;
Conference_Titel :
Network and Parallel Computing Workshops, 2007. NPC Workshops. IFIP International Conference on
Conference_Location :
Liaoning
Print_ISBN :
978-0-7695-2943-1
DOI :
10.1109/NPC.2007.104