DocumentCode
265976
Title
Correlated community estimation models over a set of names
Author
Veluru, Suresh ; Rahulamathavan, Yogachandran ; Manandhar, Suresh ; Rajarajan, Muttukrishnan
Author_Institution
Inf. Security Group, City Univ. London, London, UK
fYear
2014
fDate
27-29 Aug. 2014
Firstpage
291
Lastpage
301
Abstract
Generally surnames (family name) or forenames are evolved over generations which can be used to understand population origins, migration, identity, social norms and cultural customs. These forenames or surnames may have hidden structure associated with them called communities. Each community might have strong correlation among several forenames and surnames. In addition, the correlation might be across communities of forenames or surnames. Popular statistical generative model such as Latent Dirichlet Allocation (LDA) has been developed to find topics in a corpus of documents. However, the LDA model can be proposed to identify hidden communities in names data set. This paper proposes several variants of latent Dirichlet allocation models to capture correlation between surnames and forenames within the communities and across the communities over a set of names collected at different locations. Initially, we propose surname correlated LDA model and forename correlated LDA model. These models identify communities in surnames or forenames and extract corresponding correlated forenames or surnames in each community respectively. Later, we propose surname community correlated LDA model and forename community correlated LDA model. These models estimate correlation among each surname community to the communities of forenames and vice versa respectively. We experiment for India and United Kingdom names data sets and conclusions are drawn.
Keywords
data analysis; document handling; social sciences computing; India; LDA; United Kingdom; correlated community estimation models; latent dirichlet allocation; surname community; Biological system modeling; Communities; Correlation; Electronic mail; Estimation; Mathematical model; Semantics; Bayesian Statistics; Communities; Correlation; Latent Dirichlet Allocation; Probabilistic Generative Models;
fLanguage
English
Publisher
ieee
Conference_Titel
Science and Information Conference (SAI), 2014
Conference_Location
London
Print_ISBN
978-0-9893-1933-1
Type
conf
DOI
10.1109/SAI.2014.6918203
Filename
6918203
Link To Document