DocumentCode :
1887866
Title :
Acoustic space analysis method utilizing statistical multidimensional scaling technique
Author :
Shozakai, M. ; Nagino, G.
Author_Institution :
Asahi Kasei Corp., Japan
fYear :
2005
fDate :
18-20 May 2005
Firstpage :
37
Abstract :
Summary form only given. In order to achieve sufficient improvement in speaker-adaptation techniques, such as the MLLR method, it is essential to obtain an adequate number of samples of the user´s voice, rendering the application of the method difficult in practical environments. Prior development of a library of highly precise acoustic models is necessary to ensure high enough speech recognition performance from the outset of using the system. It is quite important to analyze a target acoustic space to design an efficient acoustic model library. However, the analysis of multidimensional acoustic space is generally a difficult task. In order to support the analysis of acoustic space through the capability of human visual perception, we proposed the COSMOS (COmprehensive Space Map of Objective Signal, previously aCOustic Space Map Of Sound) method. It features the visualization of an aggregate of acoustic models based on stochastic models, such as HMM and GMM, into a two-dimensional map (called COSMOS map) by utilizing a statistical multidimensional scaling technique of nonlinear projection. First, the paper formulates the COSMOS method. Then, a quantitative analysis of a speaking style COSMOS map is described. Error analysis of the mapping from multidimensional space to two-dimensional space in the COSMOS map is investigated. Furthermore, it is suggested that there exist multiple radiated axes of acoustic feature continuity in the COSMOS map.
Keywords :
Gaussian processes; acoustic signal processing; adaptive signal processing; hidden Markov models; multidimensional signal processing; speech recognition; statistical analysis; visual perception; GMM; HMM; acoustic feature continuity; acoustic model library; acoustic space analysis method; acoustic space map of sound; comprehensive space map of objective signal; human visual perception; nonlinear projection; speaker-adaptation techniques; speaking style; statistical multidimensional scaling; stochastic models; Aggregates; Hidden Markov models; Humans; Libraries; Maximum likelihood linear regression; Multidimensional systems; Signal analysis; Speech recognition; Visual perception; Visualization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Nonlinear Signal and Image Processing, 2005. NSIP 2005. Abstracts. IEEE-Eurasip
Conference_Location :
Sapporo
Print_ISBN :
0-7803-9064-4
Type :
conf
DOI :
10.1109/NSIP.2005.1502287
Filename :
1502287
Link To Document :
بازگشت