Author :
Namahoot, Chakkrit Snae ; Lobo, Desmond ; Kabbua, Sirinan
Author_Institution :
Dept. of Comput. Sci. & Inf. Technol., Naresuan Univ., Phitsanulok, Thailand
Abstract :
Tourism is an industry that is vital to the economic development of a country. Publicity and promotion of tourism is continuously carried out, especially with the help of the Internet. When tourists need to get more information, they usually search the web and use search engines. However, the number of search results can be huge and unwanted information that is uncategorized and incoherent may be presented. Furthermore, the results of the search are not presented in a single site. Extracting all the relevant information can waste time and is an inconvenient method of gathering information from a single information source, e.g. where to travel, dine, stay, and shop. We solved the problem by modifying the algorithms for the classification of travel sites with a Thai text analysis technique using five parts of the website HTML structure: the title tag, the body tag, the meta name description, the meta name keywords, and the links to other pages. Next, we developed algorithms to analyze and categorize websites with 31 combinations, based on various website structures, and measured the efficiency using the F-measure statistic. Then, we compared our results with another technique. These new results showed that our modified technique was better. To find the best pattern from 31 different combinations, we tested the algorithms using 200 Thai tourist websites and used four categories: attractions, accommodation, restaurants, and gift shops. Our results showed that the content within the HTML body tag alone was sufficient to classify the sites.
Keywords :
Web sites; pattern classification; pattern clustering; search engines; text analysis; travel industry; F-measure statistic; HTML structure; Internet; Thai text analysis technique; Thai tourism Websites; body tag; economic development; meta name description; meta name keywords; search engines; text clustering technique; title tag; Algorithm design and analysis; Classification algorithms; Equations; HTML; Mathematical model; Probability; Web pages; algorithms; metadata; ontology; web clustering; website analysis;