DocumentCode
3038595
Title
Extraction technology of blog comments based on functional semantic units
Author
Chun-long, Fan ; Hui, Meng
Author_Institution
Dept. of Comput., Shenyang Aerosp. Univ., ShenYang, China
Volume
3
fYear
2012
fDate
25-27 May 2012
Firstpage
422
Lastpage
426
Abstract
Blog is an important kind of network information resources, extracting its comments information is essential for the researches of public opinion analysis and so on. In this paper we summarized the prevalent extraction algorithms of blog comments and described how to use page structure in information extraction. The indicator phrases such as "Home" have clear semantics and functional indication when people understand the web pages. The indicatior phrases are known as Functional Semantic Units (FSU). Base on the characteristic of FSU We propose a kind of comment information extracting model, and present a detailed model of thinking and implementation process. Such as the page structure linearized, functional semantic units are distinguished, main text are recognized and comments extraction algorithm etc. Finally, the experiments prove that the comment information extracting model is effective and better identification results.
Keywords
Web sites; information retrieval; FSU; Web pages; blog comments; comment information extracting model; comments extraction algorithm; extraction technology; functional semantic units; network information resources; page structure; public opinion analysis; Arrays; Blogs; Data mining; Feature extraction; Layout; Semantics; Web pages; blog; comment; functional semantic unit; information extraction;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Automation Engineering (CSAE), 2012 IEEE International Conference on
Conference_Location
Zhangjiajie
Print_ISBN
978-1-4673-0088-9
Type
conf
DOI
10.1109/CSAE.2012.6272985
Filename
6272985
Link To Document