Title :
Topic Detection in Instant Messages
Author :
Han Zhang ; Chang-Dong Wang ; Jian-Huang Lai
Author_Institution :
Sch. of Mobile Inf. Eng., Sun Yat-sen Univ., Zhuhai, China
Abstract :
In the past few years, instant messaging (IM) has been widely used in daily communication. However, due to the dispersion of topics and meaningless chatting, online IM groups are filled with useless messages. In order to help IM users capture what the IM group is talking about without reading all the messages, topic discovery in instant messages becomes a significant but challenging research task. In this paper, we propose a new method for topic detection in instant messages, which is applicable for the case where 1) useless terms keep emerging, 2) the instant messages are very short, and 3) multiple languages are used. The basic step is to treat each message in an online group discussion as a data item in message stream, and then apply PLSA on the collected instant messages. One strategy is designed to segment multilingual message without utilizing machine translation and remove the useless words that keep emerging. Extensive experiments conducted on the real world QQ group data confirm the effectiveness of the proposed method.
Keywords :
electronic messaging; natural language processing; text analysis; PLSA; QQ group data; instant messages; instant messaging; meaningless chatting; message stream; multilingual message; multiple languages; online IM groups; topic detection; topic discovery; useless terms; Accuracy; Dispersion; Educational institutions; Instant messaging; Probabilistic logic; Semantics; Sun; Instant message; Multilingual; PLSA; Topic detection; Useless words;
Conference_Titel :
Machine Learning and Applications (ICMLA), 2014 13th International Conference on
Conference_Location :
Detroit, MI
DOI :
10.1109/ICMLA.2014.41