• DocumentCode
    234847
  • Title

    Using a thesaurus-based approach for the categorisation of web sites

  • Author

    Pudaruth, Sameerchand ; Ankiah, Youven ; Sembhoo, Keshav

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Univ. of Mauritius, Réduit, Mauritius
  • fYear
    2014
  • fDate
    7-9 Aug. 2014
  • Firstpage
    624
  • Lastpage
    628
  • Abstract
    With the increasing number of Mauritian-owned websites on the internet, the need for classification is becoming highly important. Our objective in this research is to classify a list of websites into seven broad categories namely education, entertainment, government, health, tourism, sports and shopping. The homepage of three hundred and nineteen websites have been used in this study. We have exploited the rich source of information (features) contained in the homepage like the meta tags, title tag, heading tags, hyperlinks, the content of the website and the domain name of the website. These information were then used to classify the websites into their most appropriate category. Several parameters like the weight applied to each feature and the keywords used to classify the websites were tuned to yield better results. The experimental evaluation revealed that the method implemented provides very high accuracy. In particularly, we obtained an accuracy of about 95% which is higher than all existing approaches considered so far in the research literature.
  • Keywords
    Internet; Web sites; classification; thesauri; Internet; Mauritian-owned Web sites; Web sites categorisation; education; entertainment; government; heading tags; health; hyperlinks; meta tags; shopping; sports; thesaurus-based approach; title tag; tourism; Accuracy; Classification algorithms; Education; Government; Thesauri; Web pages; classification; controlled vocabulary; natural language processing; thesaurus; website;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Contemporary Computing (IC3), 2014 Seventh International Conference on
  • Conference_Location
    Noida
  • Print_ISBN
    978-1-4799-5172-7
  • Type

    conf

  • DOI
    10.1109/IC3.2014.6897245
  • Filename
    6897245