• DocumentCode
    2019080
  • Title

    Large-Scale SMS Messages Mining Based on Map-Reduce

  • Author

    Xia, Tian

  • Author_Institution
    Key Lab. of Data Eng. & Knowledge Eng., Renmin Univ. of China, Beijing
  • Volume
    1
  • fYear
    2008
  • fDate
    17-18 Oct. 2008
  • Firstpage
    7
  • Lastpage
    12
  • Abstract
    Mining the popular SMS messages in a short period of time is very valuable. However, traditional OLAP-based mining method is not suitable for this very large scale dataset. In this paper, we present a mining approach based on Map-Reduce parallel framework: Firstly, original dataset is pre-processed and grouped by the senders´ mobile numbers. Secondly, we do a transformation to regroup the dataset by the short content keys, and then extract the popular messages according to the count of different senders which have the same key. Furthermore, we propose a sentence similarity computation method and a novel Forward Merging and K-Neighbor Checking algorithm to merge the similar messages semantically. Experimental results show that the final dataset of popular messages is very small with high sending coverage ratio, and can meet the real requirements.
  • Keywords
    data mining; electronic messaging; merging; parallel programming; very large databases; forward merging; k-neighbor checking algorithm; large-scale SMS messages mining; map-reduce parallel framework; sentence similarity computation method; Computational intelligence; Data engineering; Data mining; Design engineering; File systems; Laboratories; Large-scale systems; Merging; Phased arrays; Search engines; Hadoop; Map-Reduce; SMS Messages Mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Design, 2008. ISCID '08. International Symposium on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-0-7695-3311-7
  • Type

    conf

  • DOI
    10.1109/ISCID.2008.9
  • Filename
    4725545