مرکز منطقه ای اطلاع رساني علوم و فناوري - Characteristics and Uses of Labeled Datasets

DocumentCode :

3520729

Title :

Characteristics and Uses of Labeled Datasets - ODP Case Study

Author :

Zhu, Dengya ; Dreher, Heinz

Author_Institution :

Sch. of Inf. Syst., Curtin Univ., Perth, WA, Australia

fYear :

2010

fDate :

1-3 Nov. 2010

Firstpage :

227

Lastpage :

234

Abstract :

Labeled datasets are essential for text categorization. They are used to train a classifier, or as a benchmark collection to evaluate categorization algorithms. However, labeling a large-scale document set is extremely expensive because it involves much human labour, and the labeling process itself is subjective rather than objective. Therefore, labels assigned to documents by only one human editor in some existing labeled document sets may be of limited use and may prove problematic for training a classifier or evaluating categorization algorithms. This research explores socially constructed Web directory, the Open Directory Project (ODP), to generate a series of labeled document sets by extracting semantic characteristics from the ODP categories which are annotated by a list of indexed Websites. The generated document sets are used to classify Web search results and the results are encouraging.

Keywords :

Web sites; information retrieval; pattern classification; text analysis; ODP case study; Web directory; Web sites; categorization algorithm evaluation; labeled datasets; open directory project; semantic characteristic extraction; text categorization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Semantics Knowledge and Grid (SKG), 2010 Sixth International Conference on

Conference_Location :

Beijing

Print_ISBN :

978-1-4244-8125-5

Electronic_ISBN :

978-0-7695-4189-1

Type :

conf

DOI :

10.1109/SKG.2010.84

Filename :

5663513

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3520729