DocumentCode :
3530906
Title :
Topic detection in noisy data sources
Author :
Denecke, Kerstin ; Brosowski, Marko
Author_Institution :
L3S Res. Center, Hannover, Germany
fYear :
2010
fDate :
5-8 July 2010
Firstpage :
50
Lastpage :
55
Abstract :
Automatic topic detection becomes more important due to the increase of information electronically available and the necessity to process and filter it. In particular, when language is noisy like in weblog postings, it is challenging to determine topics correctly. Nevertheless, it is still unclear, to what extent existing topic detection algorithms are able to deal with this noisy material. In this paper, Latent Dirichlet Allocation (LDA) is exploited to determine topics in weblog sentences. We perform an extensive evaluation of this algorithm on real world data of different domains. The results show that LDA can successfully determine topics even for short and noisy sentences.
Keywords :
Web sites; information filtering; Weblog sentence; automatic topic detection; information filtering; latent Dirichlet allocation; noisy data sources; Accuracy; Blogs; Context; Correlation; Noise measurement; Pediatrics; Software;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Information Management (ICDIM), 2010 Fifth International Conference on
Conference_Location :
Thunder Bay, ON
Print_ISBN :
978-1-4244-7572-8
Type :
conf
DOI :
10.1109/ICDIM.2010.5664202
Filename :
5664202
Link To Document :
بازگشت