Author_Institution :
Knowledge Grid Lab., Key Lab. of Intell. Inf. Process., Beijing, China
Abstract :
With the prevalence of geo-position devices GPS and smart phones, textual information, long or short, associated with GPS tags usually denoted by a coordinate with latitude and longitude is widely encountered on the Web. For example, in Twitter, smart phones record the location of every tweet after it is authorized. The spatial objects on Google map are also presented with some textual descriptions. These data provide researchers two-dimension perspectives, fusing both location and textual dimension. In real world, similar concepts, discussions, product sales and geographic objects are often gathered together which people are interest in. This paper studied three methods that combine a density-based clustering method DBSCAN with LDA (Latent Dirichlet Allocation) method in different ways to discover geographical topics. The first method clusters the GPS tagged documents and then for each clustered region, it applies LDA on the document set in that region, The second method first uses DBSCAN to find clusters among documents, merges documents from each cluster region into one document and form one document set on which LDA is applied, The third method applies LDA on the whole document set, and then, extracts topics generated by LDA in a specific region from DBSCAN. Abundant experiments are conducted with varying parameters. It is shown that three methods performed well in discovering topics in clusters, but the first two methods take much less computing time.
Keywords :
Global Positioning System; geophysics computing; mobile computing; DBSCAN; GPS tags; Google map; LDA; Twitter; density-based clustering method; geo-position devices; latent Dirichlet allocation method; location-driven geographical topic discovery; product sales; smart phones; textual descriptions; textual information; Analytical models; Computers; Data mining; Educational institutions; Global Positioning System; Space vehicles; Sports equipment;