Title :
Gender prediction of Indian names
Author :
Tripathi, Anshuman ; Faruqui, Manaal
Author_Institution :
Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Kharagpur, India
Abstract :
We present a Support Vector Machine (SVM) based classification approach for gender prediction of Indian names.We first identify various features based upon morphological analysis that can be useful for such classification and evaluate them. We then state a novel approach of using n-gram-suffixes along with these features which gives us significant advantage over the baseline approach. We believe that we are the first to use n-grams of suffixes instead of the whole word for predictor systems. Our system reports a top F1 score of 94.9% which is expected to improve further with increase in training data size.
Keywords :
classification; gender issues; support vector machines; Indian name gender prediction; morphological analysis; n-gram-suffixes; predictor systems; support vector machine based classification; Computer science; Electronic mail; Kernel; Natural language processing; Support vector machines; Training; Training data;
Conference_Titel :
Students' Technology Symposium (TechSym), 2011 IEEE
Conference_Location :
Kharagpur
Print_ISBN :
978-1-4244-8941-1
DOI :
10.1109/TECHSYM.2011.5783842