DocumentCode
2019080
Title
Large-Scale SMS Messages Mining Based on Map-Reduce
Author
Xia, Tian
Author_Institution
Key Lab. of Data Eng. & Knowledge Eng., Renmin Univ. of China, Beijing
Volume
1
fYear
2008
fDate
17-18 Oct. 2008
Firstpage
7
Lastpage
12
Abstract
Mining the popular SMS messages in a short period of time is very valuable. However, traditional OLAP-based mining method is not suitable for this very large scale dataset. In this paper, we present a mining approach based on Map-Reduce parallel framework: Firstly, original dataset is pre-processed and grouped by the senders´ mobile numbers. Secondly, we do a transformation to regroup the dataset by the short content keys, and then extract the popular messages according to the count of different senders which have the same key. Furthermore, we propose a sentence similarity computation method and a novel Forward Merging and K-Neighbor Checking algorithm to merge the similar messages semantically. Experimental results show that the final dataset of popular messages is very small with high sending coverage ratio, and can meet the real requirements.
Keywords
data mining; electronic messaging; merging; parallel programming; very large databases; forward merging; k-neighbor checking algorithm; large-scale SMS messages mining; map-reduce parallel framework; sentence similarity computation method; Computational intelligence; Data engineering; Data mining; Design engineering; File systems; Laboratories; Large-scale systems; Merging; Phased arrays; Search engines; Hadoop; Map-Reduce; SMS Messages Mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence and Design, 2008. ISCID '08. International Symposium on
Conference_Location
Wuhan
Print_ISBN
978-0-7695-3311-7
Type
conf
DOI
10.1109/ISCID.2008.9
Filename
4725545
Link To Document