DocumentCode :
3764820
Title :
Density based clustering for Cricket World Cup tweets using Cosine similarity and time parameter
Author :
Nilang Pandey
Author_Institution :
LJ Institute of Engineering and Technology, PG CE Department, Ahmedabad, India
fYear :
2015
Firstpage :
1
Lastpage :
6
Abstract :
The rapid spread of location-based devices and cheap storage mechanisms, as well as fast development of Internet technology, allowed collection and distribution of huge amounts of user-generated data. These user generated data sometimes are known as georeferenced documents, they have their location information and time of posting embedded with them. These parameters help to retrieve the location information and the time of posting. We need to retrieve the topic from those geo-referenced documents and determine the local topics and events for a particular region. All these clusters are geospatial in arbitrary shape hence density based clustering is the most appropriate clustering algorithm. Here we used tweets from Twitter, while the DBSCAN method is used for generating clusters. Here for finding similarity between tweets cosine similarity is used, but because of its low value we increase its value by adding weight to it by matching the keywords in tweets. Also another parameter of time is used for separating clusters temporally. Results have shown that weighted keyword based method gives more specific clusters than DBSCAN method, while using the time parameter in it we get clusters time separated. Hence for purpose of information retrieval or building marketing strategy by tweets we can use this method.
Keywords :
"Clustering algorithms","Internet","Shape","Twitter","Clustering methods","Buildings","Mobile handsets"
Publisher :
ieee
Conference_Titel :
India Conference (INDICON), 2015 Annual IEEE
Electronic_ISBN :
2325-9418
Type :
conf
DOI :
10.1109/INDICON.2015.7443520
Filename :
7443520
Link To Document :
بازگشت