مرکز منطقه ای اطلاع رساني علوم و فناوري - Labeled and unlabeled data in text categorization

DocumentCode :

2328981

Title :

Labeled and unlabeled data in text categorization

Author :

Silva, Catarina ; Ribeiro, Bemardete

Author_Institution :

Escola Superior de Tecnologia e Gestao, Instituto Politecnico de Leiria, Portugal

Volume :

fYear :

2004

fDate :

25-29 July 2004

Firstpage :

2971

Abstract :

There is a growing interest in exploring the use of unlabeled data as a way to improve classification performance in text categorization. The ready availability of this kind of data in most applications makes it an appealing source of information. This work reports a study carried out on the Reuters-21578 corpus to evaluate the performance of support vector machines when unlabeled examples are introduced in the learning process. The improvement achieved, especially in false negative values and therefore in recall values, demonstrates that the use of unlabeled examples can be very important for small data sets.

Keywords :

learning (artificial intelligence); support vector machines; text analysis; labeled data; learning process; support vector machines; text categorization; unlabeled data; Availability; Electronic mail; Information management; Information resources; Labeling; Machine learning; Support vector machine classification; Support vector machines; Text categorization; Web sites;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on

ISSN :

1098-7576

Print_ISBN :

0-7803-8359-1

Type :

conf

DOI :

10.1109/IJCNN.2004.1381138

Filename :

1381138

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2328981