DocumentCode :
2456883
Title :
Automatic User Comment Detection in Flat Internet Fora
Author :
Bank, Mathias ; Mattes, Michael
Author_Institution :
Fac. for Math. & Econ., Univ. of Ulm, Ulm, Germany
fYear :
2009
fDate :
Aug. 31 2009-Sept. 4 2009
Firstpage :
373
Lastpage :
377
Abstract :
Millions of people are using the World Wide Web and are publishing content online. This user generated content contains many information relevant not only to marketing but to companies in general (customer-oriented products), governments (direct democracy) and many more. Analysis on such data becomes more and more important. This paper deals with a prerequisite: we propose an algorithm to automatically detect posting structures in flat internet fora to extract user comments. The algorithm is able to handle a wide range of different fora systems - even nested structures. The approach first detects the main content section by applying a modified version of the SST algorithm and then detects the posting structure by using several posting properties found in internet fora. It creates XPath expressions for faster data extraction in further steps.
Keywords :
Internet; data analysis; information retrieval; SST algorithm; World Wide Web; XPath expressions; automatic posting structures detection; automatic user comment detection; data analysis; data extraction; flat internet fora; user generated content; Algorithm design and analysis; Data mining; Databases; Expert systems; Internet; Mathematics; Publishing; User-generated content; Web pages; Web sites; Information Retrieval; crawler; extraction; forum; internet community; social media; web 2.0;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database and Expert Systems Application, 2009. DEXA '09. 20th International Workshop on
Conference_Location :
Linz
ISSN :
1529-4188
Print_ISBN :
978-0-7695-3763-4
Type :
conf
DOI :
10.1109/DEXA.2009.14
Filename :
5337102
Link To Document :
بازگشت