Title :
Identification of spam comments using natural language processing techniques
Author :
Radulescu, Cristina ; Dinsoreanu, Mihaela ; Potolea, Rodica
Author_Institution :
Tech. Univ. of Cluj-Napoca, Cluj-Napoca, Romania
Abstract :
The high popularity of modern web is partly due to the increase in the number of content sharing applications. The social tools provided by the content sharing applications allow online users to interact, to express their opinions and to read opinions from other users. However, spammers provide comments which are written intentionally to mislead users by redirecting them to web sites to increase their rating and to promote products less known on the market. Reading spam comments is a bad experience and a waste of time for most of the online users but can also be harming and cause damage to the reader. Research has been performed in this domain in order to identify and eliminate spam comments. Our goal is to detect comments which are likely to represent spam considering some indicators: a discontinuous text flow, inadequate and vulgar language or not related to a specific context. Our approach relies on machine learning algorithms and topic detection.
Keywords :
learning (artificial intelligence); natural language processing; social networking (online); text analysis; Web sites; content sharing applications; discontinuous text flow; machine learning algorithms; natural language processing techniques; social tools; spam comment identification; spam comment reading; topic detection; vulgar language; Context; Feature extraction; Natural language processing; Unsolicited electronic mail; White spaces; YouTube; Co-occurrence; Feature vector; Machine learning; Opinion; Post-comment similarity; Sentiment; Spam; Topic extraction;
Conference_Titel :
Intelligent Computer Communication and Processing (ICCP), 2014 IEEE International Conference on
Conference_Location :
Cluj Napoca
Print_ISBN :
978-1-4799-6568-7
DOI :
10.1109/ICCP.2014.6936976