Title :
Automatic recognition of major language families in India
Author :
Sengupta, Dipak ; Saha, Gobinda
Author_Institution :
Dept. of Electron. & Electr. Commun. Eng., Indian Inst. of Technol., Kharagpur, Kharagpur, India
Abstract :
India is a vast country with a large number of languages. Among these some languages descend from a single mother language giving rise to a language family. The major official languages in India fall under two language families namely Indo-European and Dravidian. In this paper, we have discussed about a system which takes speech file as input and identifies the language family to which it belongs. We also used this system to find out the influence of Dravidian family on Indo-European family. The system uses a combination of Mel Frequency Cepstral Coefficients (MFCC) and Shifted Delta Coefficients (SDC) as language specific features. Presently, SDC is the most popular feature for language identification. It captures temporal information of speech over a broad range of time. Gaussian Mixture Model based approach is used to effectively model the language families where the distribution of feature vector of a class is approximated using sum of Gaussians. The results give interesting insights of certain Indian languages and applicability of machine learning process in this domain.
Keywords :
Gaussian processes; learning (artificial intelligence); natural language processing; Dravidian family; Gaussian mixture model; Indian languages; Indo-European family; MFCC; Mel Frequency Cepstral Coefficients; SDC; Shifted Delta Coefficients; automatic recognition; language family; language identification; language specific features; machine learning; major language families; speech temporal information; Accuracy; Gaussian mixture model; Mel frequency cepstral coefficient; Speech; Training; GMM; MFCC; SDC;
Conference_Titel :
Intelligent Human Computer Interaction (IHCI), 2012 4th International Conference on
Conference_Location :
Kharagpur
Print_ISBN :
978-1-4673-4367-1
DOI :
10.1109/IHCI.2012.6481844