Abstract :
A speech recognition system based on the psychoacoustics of the masking property of the of human auditory system is proposed. The method utilizes several psychoacoustic properties of human perception to define perceptual speech excitation function (masking threshold) and perceptual noise. Based on the auditory masking threshold, a time-frequency noise spectral subtraction is implemented. For a human listener, the noise below the masking threshold is inaudible, and the objective is to minimize only the noise spectrum above the masking threshold. Additionally, we show that, for ASR applications, further improvements in recognition performance may be obtained by augmenting the masking of the noise by spectral subtraction in the masked region also. The strategy is to remove the masked noise from the ASR system, similar to the masking effect in the human auditory system. Based on the AMT, and the estimated perceptual noise, we have implemented two spectral subtraction algorithms: a straight-forward scheme of subtracting the total estimated perceptual noise from the noisy speech spectrum, and a spectral subtraction of the noise which lies below the masking threshold. It was observed that, both methods give significant improvements over the base PLP performance, with the latter method giving better recognition results.
Keywords :
hearing; speech recognition; auditory masking threshold; automatic speech recognition; human auditory system; perceptual noise; perceptual speech excitation function; psychoacoustics; time-frequency noise spectral subtraction; utilizing auditory masking; Humans; Masking threshold; Noise; Noise measurement; Psychoacoustics; Speech; Speech recognition;