Title :
Topic detection in noisy data sources
Author :
Denecke, Kerstin ; Brosowski, Marko
Author_Institution :
L3S Res. Center, Hannover, Germany
Abstract :
Automatic topic detection becomes more important due to the increase of information electronically available and the necessity to process and filter it. In particular, when language is noisy like in weblog postings, it is challenging to determine topics correctly. Nevertheless, it is still unclear, to what extent existing topic detection algorithms are able to deal with this noisy material. In this paper, Latent Dirichlet Allocation (LDA) is exploited to determine topics in weblog sentences. We perform an extensive evaluation of this algorithm on real world data of different domains. The results show that LDA can successfully determine topics even for short and noisy sentences.
Keywords :
Web sites; information filtering; Weblog sentence; automatic topic detection; information filtering; latent Dirichlet allocation; noisy data sources; Accuracy; Blogs; Context; Correlation; Noise measurement; Pediatrics; Software;
Conference_Titel :
Digital Information Management (ICDIM), 2010 Fifth International Conference on
Conference_Location :
Thunder Bay, ON
Print_ISBN :
978-1-4244-7572-8
DOI :
10.1109/ICDIM.2010.5664202