DocumentCode :
179728
Title :
Enhancement of a text clustering technique for the classification of Thai tourism websites
Author :
Namahoot, Chakkrit Snae ; Lobo, Desmond ; Kabbua, Sirinan
Author_Institution :
Dept. of Comput. Sci. & Inf. Technol., Naresuan Univ., Phitsanulok, Thailand
fYear :
2014
fDate :
July 30 2014-Aug. 1 2014
Firstpage :
203
Lastpage :
208
Abstract :
Tourism is an industry that is vital to the economic development of a country. Publicity and promotion of tourism is continuously carried out, especially with the help of the Internet. When tourists need to get more information, they usually search the web and use search engines. However, the number of search results can be huge and unwanted information that is uncategorized and incoherent may be presented. Furthermore, the results of the search are not presented in a single site. Extracting all the relevant information can waste time and is an inconvenient method of gathering information from a single information source, e.g. where to travel, dine, stay, and shop. We solved the problem by modifying the algorithms for the classification of travel sites with a Thai text analysis technique using five parts of the website HTML structure: the title tag, the body tag, the meta name description, the meta name keywords, and the links to other pages. Next, we developed algorithms to analyze and categorize websites with 31 combinations, based on various website structures, and measured the efficiency using the F-measure statistic. Then, we compared our results with another technique. These new results showed that our modified technique was better. To find the best pattern from 31 different combinations, we tested the algorithms using 200 Thai tourist websites and used four categories: attractions, accommodation, restaurants, and gift shops. Our results showed that the content within the HTML body tag alone was sufficient to classify the sites.
Keywords :
Web sites; pattern classification; pattern clustering; search engines; text analysis; travel industry; F-measure statistic; HTML structure; Internet; Thai text analysis technique; Thai tourism Websites; body tag; economic development; meta name description; meta name keywords; search engines; text clustering technique; title tag; Algorithm design and analysis; Classification algorithms; Equations; HTML; Mathematical model; Probability; Web pages; algorithms; metadata; ontology; web clustering; website analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Engineering Conference (ICSEC), 2014 International
Conference_Location :
Khon Kaen
Print_ISBN :
978-1-4799-4965-6
Type :
conf
DOI :
10.1109/ICSEC.2014.6978195
Filename :
6978195
Link To Document :
بازگشت