DocumentCode :
1809847
Title :
A Co-training Approach based TEF-WA technique
Author :
Huanling, Tang ; Mingyu, Lu ; Na, Liu
Author_Institution :
Dalian Maritime Univ., Dalian
fYear :
2007
fDate :
18-21 Sept. 2007
Firstpage :
1021
Lastpage :
1026
Abstract :
Traditional categorization algorithm suffers from not having sufficient labeled training data for learning, while large amount unlabeled data are easily available. We investigate co-training algorithm and its assumption that the features set can be split into two compatible and independent views. However, the assumption is usually violated to some degree in practice and sometimes the natural feature split does not exist. So we adopt TEF_WA technique which utilizes term evaluation functions to split features set and construct multiple views. We can choose a pair of views which are compatible and independent to certain degree. Based TEF_WA technique we develop a semi-supervised categorization algorithm Co_CLM. Experimental results show Co_CLM can significantly decrease the classification error utilizing unlabeled data especially labeled data is sparse. Our experimental results also indicate Co_CLM will achieve more satisfactory performance with the more independent view pairs.
Keywords :
document handling; learning (artificial intelligence); Co_CLM; TEF_WA technique; Web documents categorization; co-training algorithm; semisupervised categorization algorithm; term evaluation functions; Constraint theory; Humans; Parallel processing; Semisupervised learning; Text categorization; Training data; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Network and Parallel Computing Workshops, 2007. NPC Workshops. IFIP International Conference on
Conference_Location :
Liaoning
Print_ISBN :
978-0-7695-2943-1
Type :
conf
DOI :
10.1109/NPC.2007.104
Filename :
4351621
Link To Document :
بازگشت