DocumentCode
2313314
Title
Improving web search result categorization using knowledge from web taxonomy
Author
Jinarat, Supakpong ; Haruechaiyasak, Choochart ; Rungsawang, Amon
Author_Institution
Dept. of Comput. Eng., Kasetsart Univ., Bangkok
fYear
2009
fDate
6-9 May 2009
Firstpage
726
Lastpage
730
Abstract
Finding relevant information from a long list of search results returned by general search engine can be difficult. The categorization technique is applied to solve this problem. One possible approach is by using some external resources such as Open Directory Project (ODP) to map search result´s URLs into the ODP categories. However, the ODP can only map some part of all URLs that returned from search engine. In this paper, we present a method of Web search result categorization based on classification technique by applying external information from the ODP. First, we categorize the search results by using information from the ODP as training data set. We then generate the categorizers from the training data based on centroid-based classification algorithm for categorized remaining uncategorized search results. The experimental result of proposed method achieved high performance of categorization comparing with an effective ODP classifier from previous work.
Keywords
Internet; data analysis; learning (artificial intelligence); pattern classification; search engines; Web search result categorization; Web taxonomy; centroid-based classification; data set training; open directory project; pattern classification; search engine; Clustering algorithms; Knowledge engineering; Laboratories; Organizing; Search engines; Taxonomy; Training data; Uniform resource locators; Web pages; Web search;
fLanguage
English
Publisher
ieee
Conference_Titel
Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, 2009. ECTI-CON 2009. 6th International Conference on
Conference_Location
Pattaya, Chonburi
Print_ISBN
978-1-4244-3387-2
Electronic_ISBN
978-1-4244-3388-9
Type
conf
DOI
10.1109/ECTICON.2009.5137150
Filename
5137150
Link To Document