DocumentCode :
1930486
Title :
Effect of Named Entities in Web Page Classification
Author :
Samarawickrama, Sameendra ; Jayaratne, Lakshman
Author_Institution :
Sch. of Comput., Univ. of Colombo, Colombo, Sri Lanka
fYear :
2012
fDate :
25-27 Sept. 2012
Firstpage :
38
Lastpage :
42
Abstract :
With the rapid multiplication of World Wide Web, there is an increasing requirement for automated web page classification techniques. Web page classification is an important task in web mining and is utilized in many other areas of research as well. General practice during classification is to use lexical terms as features. In this paper we investigate the effect of considering named entities as features in web page classification. We have conducted tests in five different domains â"-baseball, football, health, politics and science â"-with web pages collected from online news providers. Our results show that incorporating named entities can result in slight gains in classifier performance for narrow domains, but is not always true for all the domains. Results also showed that classification based only on named entities can be good for certain domains (e.g., baseball) but is still lower than the lexical terms based representation.
Keywords :
Web sites; classification; data mining; Web mining; World Wide Web; automated Web page classification; lexical terms; named entities; Accuracy; Dictionaries; Educational institutions; Feature extraction; Machine learning; Sports equipment; Web pages; named entities; web mining; web page classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence, Modelling and Simulation (CIMSiM), 2012 Fourth International Conference on
Conference_Location :
Kuantan
ISSN :
2166-8531
Print_ISBN :
978-1-4673-3113-5
Type :
conf
DOI :
10.1109/CIMSim.2012.55
Filename :
6338042
Link To Document :
بازگشت