DocumentCode :
3740082
Title :
Mining User-Generated Comments
Author :
Julien Subercaze;Christophe Gravier;Fr?d?rique
Author_Institution :
Univ. de Lyon, St. Etienne, France
Volume :
1
fYear :
2015
Firstpage :
45
Lastpage :
52
Abstract :
Social-media websites, such as newspapers, blogs, and forums, are the main places of generation and exchange of user-generated comments. These comments are viable sources for opinion mining, descriptive annotations and information extraction. User-generated comments are formatted using a HTML template, they are therefore entwined with the other information in the HTML document. Their unsupervised extraction is thus a taxing issue - even greater when considering the extraction of nested answers by different users. This paper presents a novel technique (CommentsMiner) for unsupervised users comments extraction. Our approach uses both the theoretical framework of frequent subtree mining and data extraction techniques. We demonstrate that the comment mining task can be modelled as a constrained closed induced subtree mining problem followed by a learning-to-rank problem. Our experimental evaluations show that CommentsMiner solves the plain comments and nested comments extraction problems for 84% of a representative and accessible dataset, while outperforming existing baselines techniques.
Keywords :
"Data mining","Vegetation","Feature extraction","HTML","User-generated content","Databases","Companies"
Publisher :
ieee
Conference_Titel :
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015 IEEE / WIC / ACM International Conference on
Type :
conf
DOI :
10.1109/WI-IAT.2015.138
Filename :
7396778
Link To Document :
بازگشت