Title :
A speech emotion recognition framework based on latent Dirichlet allocation: Algorithm and FPGA implementation
Author :
Shah, Mubarak ; Lifeng Miao ; Chakrabarti, Chaitali ; Spanias, A.
Author_Institution :
Sch. of Electr., Arizona State Univ., Tempe, AZ, USA
Abstract :
In this paper, we present a speech-based emotion recognition framework based on a latent Dirichlet allocation model. This method assumes that incoming speech frames are conditionally independent and exchangeable. While this leads to a loss of temporal structure, it is able to capture significant statistical information between frames. In contrast, a hidden Markov model-based approach captures the temporal structure in speech. Using the German emotional speech database EMO-DB for evaluation, we achieve an average classification accuracy of 80.7% compared to 73% for hidden Markov models. This improvement is achieved at the cost of a slight increase in computational complexity. We map the proposed algorithm onto an FPGA platform and show that emotions in a speech utterance of duration 1.5s can be identified in 1.8ms, while utilizing 70% of the resources. This further demonstrates the suitability of our approach for real-time applications on hand-held devices.
Keywords :
computational complexity; emotion recognition; field programmable gate arrays; hidden Markov models; speech recognition; EMO-DB; FPGA platform; German emotional speech database; computational complexity; hand-held devices; hidden Markov models; latent Dirichlet allocation model; speech emotion recognition framework; speech utterance; statistical information; temporal structure; time 1.5 s; time 1.8 ms; Abstracts; Complexity theory; Educational institutions; Emotion recognition; Speech; Speech recognition; FPGA implementation; affective computing; emotion recognition; latent Dirichlet allocation;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6638116