DocumentCode
658356
Title
Automatic Class Labeling for CiteSeerX
Author
Kashireddy, Surya Dhairya ; Gauch, Susan ; Billah, Syed Masum
Author_Institution
Comput. Sci. & Comput. Eng., Univ. of Arkansas, Fayetteville, AR, USA
Volume
1
fYear
2013
fDate
17-20 Nov. 2013
Firstpage
241
Lastpage
245
Abstract
The CiteSeerx project at the University of Arkansas uses a browsing interface is based on the Association for Computing Machinery´s Computing Classification System (ACM CCS). CCS contains just 369 categories whereas the CiteSeerx database contains over 2 million documents. This results in more than 6500 documents per category, far too many to browse. To address this problem, we are exploring ways to automatically expand the CCS ontology. Previous work has focused on using clustering to automatically identify the new classes. This work focuses on how to label the subclasses in a semantically meaningful way to that they can support user browsing. We develop methods based on text mining from the subclass members to extract class labels. We evaluate three methods by comparing the suggested labels with human-assigned labels for existing categories.
Keywords
data analysis; data mining; database management systems; online front-ends; ontologies (artificial intelligence); pattern classification; text analysis; ACM CCS; Association for Computing Machinery Computing Classification System; CCS ontology; CiteSeerx project; CiteSeerx database; University of Arkansas; automatic class labeling; browsing interface; human-assigned labels; subclass members; text mining; user browsing; Clustering algorithms; Encyclopedias; Labeling; Ontologies; Programming; Semantic Web; Text mining; labeling; ontologies; text mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013 IEEE/WIC/ACM International Joint Conferences on
Conference_Location
Atlanta, GA
Print_ISBN
978-1-4799-2902-3
Type
conf
DOI
10.1109/WI-IAT.2013.35
Filename
6690021
Link To Document