• DocumentCode
    3764820
  • Title

    Density based clustering for Cricket World Cup tweets using Cosine similarity and time parameter

  • Author

    Nilang Pandey

  • Author_Institution
    LJ Institute of Engineering and Technology, PG CE Department, Ahmedabad, India
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    The rapid spread of location-based devices and cheap storage mechanisms, as well as fast development of Internet technology, allowed collection and distribution of huge amounts of user-generated data. These user generated data sometimes are known as georeferenced documents, they have their location information and time of posting embedded with them. These parameters help to retrieve the location information and the time of posting. We need to retrieve the topic from those geo-referenced documents and determine the local topics and events for a particular region. All these clusters are geospatial in arbitrary shape hence density based clustering is the most appropriate clustering algorithm. Here we used tweets from Twitter, while the DBSCAN method is used for generating clusters. Here for finding similarity between tweets cosine similarity is used, but because of its low value we increase its value by adding weight to it by matching the keywords in tweets. Also another parameter of time is used for separating clusters temporally. Results have shown that weighted keyword based method gives more specific clusters than DBSCAN method, while using the time parameter in it we get clusters time separated. Hence for purpose of information retrieval or building marketing strategy by tweets we can use this method.
  • Keywords
    "Clustering algorithms","Internet","Shape","Twitter","Clustering methods","Buildings","Mobile handsets"
  • Publisher
    ieee
  • Conference_Titel
    India Conference (INDICON), 2015 Annual IEEE
  • Electronic_ISBN
    2325-9418
  • Type

    conf

  • DOI
    10.1109/INDICON.2015.7443520
  • Filename
    7443520