مرکز منطقه ای اطلاع رساني علوم و فناوري - Automatic speech data clustering with human perception based weighted distance

DocumentCode :

134212

Title :

Automatic speech data clustering with human perception based weighted distance

Author :

Xixin Wu ; Zhiyong Wu ; Jia Jia ; Meng, Hsiang-Yun ; Lianhong Cai ; Weifeng Li

Author_Institution :

Shenzhen Key Lab. of Inf. Sci. & Technol., Tsinghua Univ., Shenzhen, China

fYear :

2014

fDate :

12-14 Sept. 2014

Firstpage :

216

Lastpage :

220

Abstract :

Speech data from internet contain different speaking styles relating to information genre, emotions, sentiments, speaker characters, etc. Automatic classification of such data remains a challenging problem due to the difficulty in defining the categories to characterize different speaking styles clearly. To address the problem, this paper proposes a method based on x-means clustering, an extended version of k-means without fixed number of classes, for the task. Moreover, x-means method clusters the data according to a pre-defined distance measurement considering different features. Current methods on distance measuring only focus on features themselves while ignoring the impact of these features on human perception. To derive a more reasonable distance measurement, this paper also proposes a human perception based weighted distance to capture the contribution of different acoustic features on human perception. In this way, the automatic classification of internet speech data will make use of the prior knowledge of human perception as well as capture the speaking style characteristics in different datasets with varying categories. Experiments on listening test demonstrate that it is useful to introduce the human perception prior knowledge in distance measurement and our proposed method outperforms the baseline with conventional Euclidian distance with 10% improvement in classification accuracy.

Keywords :

Internet; pattern classification; pattern clustering; speech processing; automatic Internet speech data classification; automatic speech data clustering; human perception based weighted distance; predefined distance measurement; speaking style characteristics; speaking styles; x-means clustering; Accuracy; Acoustics; Distance measurement; Internet; Speech; Speech recognition; Text recognition; feature weights; human perception; speech clustering; x-means;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on

Conference_Location :

Singapore

Type :

conf

DOI :

10.1109/ISCSLP.2014.6936604

Filename :

6936604

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=134212