• DocumentCode
    1118274
  • Title

    An Effective Algorithm for Automatic Detection and Exact Demarcation of Breath Sounds in Speech and Song Signals

  • Author

    Ruinskiy, Dima ; Lavner, Yizhar

  • Author_Institution
    Dept. of Comput. Sci., Tel-Hai Acad., Upper Galilee
  • Volume
    15
  • Issue
    3
  • fYear
    2007
  • fDate
    3/1/2007 12:00:00 AM
  • Firstpage
    838
  • Lastpage
    850
  • Abstract
    Automatic detection of predefined events in speech and audio signals is a challenging and promising subject in signal processing. One important application of such detection is removal or suppression of unwanted sounds in audio recordings, for instance in the professional music industry, where the demand for quality is very high. Breath sounds, which are present in most song recordings and often degrade the aesthetic quality of the voice, are an example of such unwanted sounds. Another example is bad pronunciation of certain phonemes. In this paper, we present an automatic algorithm for accurate detection of breaths in speech or song signals. The algorithm is based on a template matching approach, and consists of three phases. In the first phase, a template is constructed from mel frequency cepstral coefficients (MFCCs) matrices of several breath examples and their singular value decompositions, to capture the characteristics of a typical breath event. Next, in the initial processing phase, each short-time frame is compared to the breath template, and marked as breathy or nonbreathy according to predefined thresholds. Finally, an edge detection algorithm, based on various time-domain and frequency-domain parameters, is applied to demarcate the exact boundaries of each breath event and to eliminate possible false detections. Evaluation of the algorithm on a database of speech and songs containing several hundred breath sounds yielded a correct identification rate of 98% with a specificity of 96%
  • Keywords
    audio recording; audio signal processing; cepstral analysis; frequency domain analysis; signal denoising; singular value decomposition; speech processing; time domain analysis; aesthetic voice quality; audio recordings; automatic breath sound detection; edge detection algorithm; exact breath sound demarcation; frequency-domain parameter; mel frequency cepstral coefficient matrices; professional music industry; signal processing; singular value decompositions; song recordings; song signal; speech signal; template matching; time-domain parameter; unwanted sound suppression; Audio recording; Degradation; Event detection; Matrix decomposition; Mel frequency cepstral coefficient; Music; Signal processing; Signal processing algorithms; Singular value decomposition; Speech processing; Breath detection; event spotting in speech and audio; mel frequency cepstral coefficient (MFCC);
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2006.889750
  • Filename
    4100696