Accurate speech segmentation by mimicking human auditory processing

Author

King, Simon ; Hasegawa-Johnson, Mark

Author_Institution

Dept. of Electr. & Comput. Eng., Univ. of Illinois, Urbana, IL, USA

fYear

2013

Firstpage

8096

Lastpage

8100

Abstract

This paper addresses the problem of locating phone boundaries without prior knowledge of the text of an utterance. A biomimetic model of human auditory processing is used to calculate the neural features of frequency synchrony and average signal level. Frequency synchrony and average signal level are used as input to a two-layered support vector machine (SVM)-based system to detect phone boundaries. Phone boundaries are detected with 87.0% precision and 84.8% recall when the automatic segmentation system has no prior knowledge of the phone sequence in the utterance.

Keywords

speech processing; synchronisation; automatic segmentation system; biomimetic model; frequency synchrony; human auditory processing; neural features; phone boundaries location; phone sequence; signal level; speech segmentation; two-layered support vector machine-based system; two-layered-based system; Computational modeling; Frequency synchronization; Ice; Speech; Speech recognition; Support vector machines; Training; Automatic segmentation; auditory modeling; average signal level; frequency synchrony;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location

Vancouver, BC

ISSN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2013.6639242

Filename

6639242