Title :
Selection of features for surname classification
Author :
Rachevsky, Lev ; Pu, Ken Q.
Abstract :
We have studied the problem of classifying of surnames into the countries of origin using a collection of feature based learning algorithms. We have compiled a database of surnames and their countries of origin from publicly available databases as training data for the classifiers. We propose a feature selection algorithm which dynamically decides the most prominent feature of the names based on the training data. Based on the selected features, we utilized a number of supervised and unsupervised learning algorithms to classify the surnames into the countries. Finally, we have compared the accuracy and performance of the different classifiers with different parameters and metrics. We are able to demonstrate that the reduced feature set works well with the well-known classifiers.
Keywords :
database management systems; pattern classification; unsupervised learning; feature based learning algorithms; feature selection algorithm; supervised learning algorithms; surname classification; surname database; unsupervised learning algorithms; Accuracy; Aerospace electronics; Computational modeling; Databases; Feature extraction; Training; Training data;
Conference_Titel :
Information Reuse and Integration (IRI), 2011 IEEE International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4577-0964-7
Electronic_ISBN :
978-1-4577-0965-4
DOI :
10.1109/IRI.2011.6009513