Title :
Adaptive Distribution of Vocabulary Frequencies: A Novel Estimation Suitable for Social Media Corpus
Author :
Igawa, Rodrigo Augusto ; Sakaji Kido, Guilherme ; Seixas, Jose Luis ; Barbon, Sylvio
Author_Institution :
Dept. of Comput., State Univ. of Londrina, Londrina, Brazil
Abstract :
This paper aims to propose a mathematical model that evaluates the distribution of the vocabulary frequency terms in proportion to a probabilistic ideal. Once we are able to evaluate it, the main objective of this work is to use it in order to examine text demising. We propose this new metric based on the classic Zipf´s law statistic method. The experimental set to test the classic Zipf´s law and our developed model is based on some books of the classic literature and some tweets sets of Twitter. Thus, our main result is that the model proposed in this work is more sensitive to the presence of text noises than Zipf´s law and is asymptotically quicker, suitable to corpus of social media networks.
Keywords :
mathematical analysis; social networking (online); text analysis; Twitter; Zipf law statistic method; adaptive distribution; mathematical model; social media corpus; social media networks; text demising; text noises; tweets sets; vocabulary frequency terms; Mathematical model; Media; Noise; Noise measurement; Noise reduction; Twitter; Vocabulary; Information Retrieval; Social Media Networks; Text preprocessing; Zipfs Law;
Conference_Titel :
Intelligent Systems (BRACIS), 2014 Brazilian Conference on
Conference_Location :
Sao Paulo
DOI :
10.1109/BRACIS.2014.58