• DocumentCode
    1229606
  • Title

    A Speech-and-Speaker Identification System: Feature Extraction, Description, and Classification of Speech-Signal Image

  • Author

    Saeed, Khalid ; Nammous, Mohammad Kheir

  • Author_Institution
    Bialystok Tech. Univ.
  • Volume
    54
  • Issue
    2
  • fYear
    2007
  • fDate
    4/1/2007 12:00:00 AM
  • Firstpage
    887
  • Lastpage
    897
  • Abstract
    This paper discusses a speech-and-speaker (SAS) identification system based on spoken Arabic digit recognition. The speech signals of the Arabic digits from zero to ten are processed graphically (the signal is treated as an object image for further processing). The identifying and classifying methods are performed with Burg´s estimation model and the algorithm of Toeplitz matrix minimal eigenvalues as the main tools for signal-image description and feature extraction. At the stage of classification, both conventional and neural-network-based methods are used. The success rate of the speaker-identifying system obtained in the presented experiments for individually uttered words is excellent and has reached about 98.8% in some cases. The miss rate of about 1.2% was almost only because of false acceptance (13 miss cases in 1100 tested voices). These results have promisingly led to the design of a security system for SAS identification. The average overall success rate was then 97.45% in recognizing one uttered word and identifying its speaker, and 92.5% in recognizing a three-digit password (three individual words), which is really a high success rate because, for compound cases, we should successfully test all the three uttered words consecutively in addition to and after identifying their speaker; hence, the probability of making an error is basically higher. The authors´ major contribution to this task involves building a system to recognize both the uttered words and their speaker through an innovative graphical algorithm for feature extraction from the voice signal. This Toeplitz-based algorithm reduces the amount of computations from operations on an ntimesn matrix that contains n2 different elements to a matrix (of Toeplitz form) that contains only n elements that are different from each other
  • Keywords
    Toeplitz matrices; eigenvalues and eigenfunctions; error statistics; neural nets; security; speaker recognition; speech processing; Burg estimation model; Toeplitz matrix minimal eigenvalue; conventional-neural-network; error probability; graphical algorithm; security system; speech-signal image; speech-speaker identification system; spoken Arabic digit recognition; Eigenvalues and eigenfunctions; Feature extraction; Security; Signal processing; Speaker recognition; Speech analysis; Speech processing; Speech recognition; Synthetic aperture sonar; Testing; Communication; Töeplitz matrix (TM) eigenvalues; humatronics; linear predictive coding; processing and recognition; speaker recognition; speech analysis; understanding speech;
  • fLanguage
    English
  • Journal_Title
    Industrial Electronics, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0278-0046
  • Type

    jour

  • DOI
    10.1109/TIE.2007.891647
  • Filename
    4126822