DocumentCode :
2239412
Title :
Extracting Topics Information from Conference Web Pages Using Page Segmentation and SVM
Author :
Chen, Yaw-Huei ; Li, Sin-Sian ; Chen, Yu-Ta
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Chiayi Univ., Chiayi, Taiwan
fYear :
2010
fDate :
18-20 Nov. 2010
Firstpage :
270
Lastpage :
277
Abstract :
Conference web pages display their topics information in different ways, and conferences in different domains accept papers on different topics. Automatic extraction of topics information from conference web pages is thus a difficult task and has not received much attention from the research community. In this paper, we propose a method for extracting topics information that uses a web page segmentation technique, VIPS, to segment web pages into visual blocks and uses SVM to generate extraction rules. We use conference web pages retrieved from DBWorld web site as empirical data, and experiments show that the proposed method produces satisfactory results.
Keywords :
Web sites; support vector machines; text analysis; DBWorld Web site; SVM; VIPS; conference Web pages; extraction rules; page segmentation; topics information extraction; conference topics information; extraction rule; machine learning; visual block;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Technologies and Applications of Artificial Intelligence (TAAI), 2010 International Conference on
Conference_Location :
Hsinchu City
Print_ISBN :
978-1-4244-8668-7
Electronic_ISBN :
978-0-7695-4253-9
Type :
conf
DOI :
10.1109/TAAI.2010.52
Filename :
5695464
Link To Document :
بازگشت