DocumentCode
234847
Title
Using a thesaurus-based approach for the categorisation of web sites
Author
Pudaruth, Sameerchand ; Ankiah, Youven ; Sembhoo, Keshav
Author_Institution
Dept. of Comput. Sci. & Eng., Univ. of Mauritius, Réduit, Mauritius
fYear
2014
fDate
7-9 Aug. 2014
Firstpage
624
Lastpage
628
Abstract
With the increasing number of Mauritian-owned websites on the internet, the need for classification is becoming highly important. Our objective in this research is to classify a list of websites into seven broad categories namely education, entertainment, government, health, tourism, sports and shopping. The homepage of three hundred and nineteen websites have been used in this study. We have exploited the rich source of information (features) contained in the homepage like the meta tags, title tag, heading tags, hyperlinks, the content of the website and the domain name of the website. These information were then used to classify the websites into their most appropriate category. Several parameters like the weight applied to each feature and the keywords used to classify the websites were tuned to yield better results. The experimental evaluation revealed that the method implemented provides very high accuracy. In particularly, we obtained an accuracy of about 95% which is higher than all existing approaches considered so far in the research literature.
Keywords
Internet; Web sites; classification; thesauri; Internet; Mauritian-owned Web sites; Web sites categorisation; education; entertainment; government; heading tags; health; hyperlinks; meta tags; shopping; sports; thesaurus-based approach; title tag; tourism; Accuracy; Classification algorithms; Education; Government; Thesauri; Web pages; classification; controlled vocabulary; natural language processing; thesaurus; website;
fLanguage
English
Publisher
ieee
Conference_Titel
Contemporary Computing (IC3), 2014 Seventh International Conference on
Conference_Location
Noida
Print_ISBN
978-1-4799-5172-7
Type
conf
DOI
10.1109/IC3.2014.6897245
Filename
6897245
Link To Document