Title :
Automatic speech data clustering with human perception based weighted distance
Author :
Xixin Wu ; Zhiyong Wu ; Jia Jia ; Meng, Hsiang-Yun ; Lianhong Cai ; Weifeng Li
Author_Institution :
Shenzhen Key Lab. of Inf. Sci. & Technol., Tsinghua Univ., Shenzhen, China
Abstract :
Speech data from internet contain different speaking styles relating to information genre, emotions, sentiments, speaker characters, etc. Automatic classification of such data remains a challenging problem due to the difficulty in defining the categories to characterize different speaking styles clearly. To address the problem, this paper proposes a method based on x-means clustering, an extended version of k-means without fixed number of classes, for the task. Moreover, x-means method clusters the data according to a pre-defined distance measurement considering different features. Current methods on distance measuring only focus on features themselves while ignoring the impact of these features on human perception. To derive a more reasonable distance measurement, this paper also proposes a human perception based weighted distance to capture the contribution of different acoustic features on human perception. In this way, the automatic classification of internet speech data will make use of the prior knowledge of human perception as well as capture the speaking style characteristics in different datasets with varying categories. Experiments on listening test demonstrate that it is useful to introduce the human perception prior knowledge in distance measurement and our proposed method outperforms the baseline with conventional Euclidian distance with 10% improvement in classification accuracy.
Keywords :
Internet; pattern classification; pattern clustering; speech processing; automatic Internet speech data classification; automatic speech data clustering; human perception based weighted distance; predefined distance measurement; speaking style characteristics; speaking styles; x-means clustering; Accuracy; Acoustics; Distance measurement; Internet; Speech; Speech recognition; Text recognition; feature weights; human perception; speech clustering; x-means;
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location :
Singapore
DOI :
10.1109/ISCSLP.2014.6936604