DocumentCode :
3703409
Title :
Understanding speaking styles of internet speech data with LSTM and low-resource training
Author :
Xixin Wu;Zhiyong Wu;Yishuang Ning;Jia Jia;Lianhong Cai;Helen Meng
Author_Institution :
Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems, Shenzhen Key Laboratory of Information Science and Technology, Graduate School at Shenzhen, Tsinghua University, Shenzhen, China
fYear :
2015
Firstpage :
815
Lastpage :
820
Abstract :
Speech are widely used to express one´s emotion, intention, desire, etc. in social network communication, deriving abundant of internet speech data with different speaking styles. Such data provides a good resource for social multimedia research. However, regarding different styles are mixed together in the internet speech data, how to classify such data remains a challenging problem. In previous work, utterance-level statistics of acoustic features are utilized as features in classifying speaking styles, ignoring the local context information. Long short-term memory (LSTM) recurrent neural network (RNN) has achieved exciting success in lots of research areas, such as speech recognition. It is able to retrieve context information for long time duration, which is important in characterizing speaking styles. To train LSTM, huge number of labeled training data is required. While for the scenario of internet speech data classification, it is quite difficult to get such large scale labeled data. On the other hand, we can get some publicly available data for other tasks (such as speech emotion recognition), which offers us a new possibility to exploit LSTM in the low-resource task. We adopt retraining strategy to train LSTM to recognize speaking styles in speech data by training the network on emotion and speaking style datasets sequentially without reset the weights of the network. Experimental results demonstrate that retraining improves the training speed and the accuracy of network in speaking style classification.
Keywords :
"Speech","Training","Speech recognition","Training data","Context","Internet","Recurrent neural networks"
Publisher :
ieee
Conference_Titel :
Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on
Electronic_ISBN :
2156-8111
Type :
conf
DOI :
10.1109/ACII.2015.7344667
Filename :
7344667
Link To Document :
بازگشت