DocumentCode :
2968120
Title :
Web page classification based on Schema.org collection
Author :
Krutil, J. ; Kudelka, Milos ; Snasel, Vaclav
Author_Institution :
Dept. of Comput. Sci., VSB Tech. Univ. of Ostrava, Ostrava Poruba, Czech Republic
fYear :
2012
fDate :
21-23 Nov. 2012
Firstpage :
356
Lastpage :
360
Abstract :
The internet is a library of a huge amount of information and there is a need for categorize its content based on web page classification. Classification of web page content can improve the quality of web search and its accuracy. Unfortunately the high dimensionality of the web pages dataset has made the process of classification difficult. The use of an automatic method for web page classification can simplify the whole process and assist the search engine in getting more relevant results. Nowadays information on the web is generally structured and formatted in a not formal way. This absence of semantics leads to create formal methods to provide more semantics information into web page. Search engines including Bing, Google, Yahoo! and Yandex formed collection of schemas Schema.org to support web page semantics and improve their search results. This paper explores the use of formal source code structure for classifying a large collection of the web content. Is focused on use of schemas collection Schema.org to classify web pages and categorize them unambiguously.
Keywords :
Internet; Web sites; pattern classification; search engines; search problems; source coding; Bing; Google; Internet; Web page content classification; Web page dataset dimensionality; Web page semantics; Web search quality; Yahoo!; Yandex; content categorization; formal methods; formal source code structure; information library; schema.org collection; search engine; semantics information; Google; HTML; Internet; Search engines; Semantics; Web pages; Web search; Collection of schemas Schema.org; Genres; Microformats; Microgenres; Web Page Clasification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Aspects of Social Networks (CASoN), 2012 Fourth International Conference on
Conference_Location :
Sao Carlos
Print_ISBN :
978-1-4673-4793-8
Type :
conf
DOI :
10.1109/CASoN.2012.6412428
Filename :
6412428
Link To Document :
بازگشت