A Speech-and-Speaker Identification System: Feature Extraction, Description, and Classification of Speech-Signal Image

Author

Saeed, Khalid ; Nammous, Mohammad Kheir

Author_Institution

Bialystok Tech. Univ.

Volume

54

Issue

2

fYear

2007

fDate

4/1/2007 12:00:00 AM

Firstpage

887

Lastpage

897

Abstract

This paper discusses a speech-and-speaker (SAS) identification system based on spoken Arabic digit recognition. The speech signals of the Arabic digits from zero to ten are processed graphically (the signal is treated as an object image for further processing). The identifying and classifying methods are performed with Burg´s estimation model and the algorithm of Toeplitz matrix minimal eigenvalues as the main tools for signal-image description and feature extraction. At the stage of classification, both conventional and neural-network-based methods are used. The success rate of the speaker-identifying system obtained in the presented experiments for individually uttered words is excellent and has reached about 98.8% in some cases. The miss rate of about 1.2% was almost only because of false acceptance (13 miss cases in 1100 tested voices). These results have promisingly led to the design of a security system for SAS identification. The average overall success rate was then 97.45% in recognizing one uttered word and identifying its speaker, and 92.5% in recognizing a three-digit password (three individual words), which is really a high success rate because, for compound cases, we should successfully test all the three uttered words consecutively in addition to and after identifying their speaker; hence, the probability of making an error is basically higher. The authors´ major contribution to this task involves building a system to recognize both the uttered words and their speaker through an innovative graphical algorithm for feature extraction from the voice signal. This Toeplitz-based algorithm reduces the amount of computations from operations on an ntimesn matrix that contains n² different elements to a matrix (of Toeplitz form) that contains only n elements that are different from each other

Keywords

Toeplitz matrices; eigenvalues and eigenfunctions; error statistics; neural nets; security; speaker recognition; speech processing; Burg estimation model; Toeplitz matrix minimal eigenvalue; conventional-neural-network; error probability; graphical algorithm; security system; speech-signal image; speech-speaker identification system; spoken Arabic digit recognition; Eigenvalues and eigenfunctions; Feature extraction; Security; Signal processing; Speaker recognition; Speech analysis; Speech processing; Speech recognition; Synthetic aperture sonar; Testing; Communication; Töeplitz matrix (TM) eigenvalues; humatronics; linear predictive coding; processing and recognition; speaker recognition; speech analysis; understanding speech;

fLanguage

English

Journal_Title

Industrial Electronics, IEEE Transactions on

Publisher

ieee

ISSN

0278-0046

Type

jour

DOI

10.1109/TIE.2007.891647

Filename

4126822