Title :
Extracting interesting related context-dependent concepts from social media streams using temporal distributions
Author :
Sayers, C.P. ; Meichun Hsu
Author_Institution :
Hewlett-Packard Labs., Palo Alto, CA, USA
Abstract :
To enable the interactive exploration of large social media datasets we exploit the temporal distributions of word n-grams within the message stream to discover “interesting” concepts, determine “relatedness” between concepts, and find representative examples for display. We present a new algorithm for context-dependent “interestingness” using the coefficient of variation of the temporal distribution, apply the well-known technique of Pearson´s Correlation to tweets using equi-height histogramming to determine correlation, and employ an asymmetric variant for computing “relatedness” to encourage exploration. We further introduce techniques using interestingness, correlation, and relatedness to automatically discover concepts and select preferred word N-grams for display. These techniques are demonstrated on an 800,000 tweet dataset from the Academy Awards.
Keywords :
Internet; information analysis; information retrieval; social networking (online); Pearson correlation; coefficient of variation; context dependent concepts; equiheight histogramming; interactive exploration; interesting concepts extraction; social media datasets; social media streams; temporal distribution; temporal distributions; word n-grams; Awards activities; Context; Correlation; Histograms; Media; Twitter; Visualization;
Conference_Titel :
Data Engineering (ICDE), 2013 IEEE 29th International Conference on
Conference_Location :
Brisbane, QLD
Print_ISBN :
978-1-4673-4909-3
Electronic_ISBN :
1063-6382
DOI :
10.1109/ICDE.2013.6544931