Title :
Implementation of the POW (phonetically optimized words) algorithm for speech database
Author :
Lim, Yeonja ; Lee, Youngjik
Author_Institution :
Autom. Interpretation Section, Electron. & Telecommun. Res. Inst., Seoul, South Korea
Abstract :
The paper proposes the concept of the POW (phonetically optimized words) set. To collect a speech database, all possible phonological phenomenon should be included. In addition, it is preferable to have the same phonological distribution as the general speech. For this purpose, the authors suggest a new algorithm for selecting a word set which has the properties that (1) it includes all phonological events, (2) it has the minimal number of words, and (3) the phonological similarity between the POW set and the high-frequency word set is maximized. The authors extract the Korean POW set from 50000 high-frequency words out of 3 million text corpus. The POW set is much more similar to the high-frequency word set than the PBW (phonetically balanced words) set with less number of words
Keywords :
natural languages; speech recognition; Korean; POW; algorithm; phonetically optimized words; phonological distribution; speech database; word set; Databases; Entropy; Error analysis; Frequency; Information theory; Large-scale systems; Speech recognition;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on
Conference_Location :
Detroit, MI
Print_ISBN :
0-7803-2431-5
DOI :
10.1109/ICASSP.1995.479280