Automatic discovery of a phonetic inventory for unwritten languages for statistical speech synthesis

Author

Muthukumar, Prasanna Kumar ; Black, Alan W.

Author_Institution

Language Technol. Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA

fYear

2014

fDate

4-9 May 2014

Firstpage

2594

Lastpage

2598

Abstract

Speech synthesis systems are typically built with speech data and transcriptions. In this paper, we try to build synthesis systems when no transcriptions or knowledge about the language are available. It is usually necessary to at least possess phonetic knowledge about the language. In this paper, we propose an automated way of obtaining phones and phonetic knowledge about the corpus at hand by making use of Articulatory Features (AFs). An Articulatory Feature predictor is trained on a bootstrap corpus in an arbitrary other language using a three-hidden layer neural network. This neural network is run on the speech corpus to extract AFs. Hierarchical clustering is used to cluster the AFs into categories i.e. phones. Phonetic information about each of these inferred phones is obtained by computing the mean of the AFs in each cluster. Results of systems built with this framework in multiple languages are reported.

Keywords

neural nets; pattern clustering; speech synthesis; statistical analysis; AF; articulatory feature predictor; bootstrap corpus; hierarchical clustering; phonetic inventory; phonetic knowledge; speech corpus; speech data; speech transcriptions; statistical speech synthesis; three-hidden layer neural network; unwritten languages; Feature extraction; Speech; Speech recognition; Speech synthesis; Synthesizers; Speech synthesis; TTS without text; articulatory features; neural networks; un-labeled speech corpora;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location

Florence

Type

conf

DOI

10.1109/ICASSP.2014.6854069

Filename

6854069