مرکز منطقه ای اطلاع رساني علوم و فناوري - Increased mfcc filter bandwidth for noise-robust phoneme recognition

Abstract :

Many speech recognition systems use mel-frequency cepstral coefficient (mfcc) feature extraction as a front end. In the algorithm, a speech spectrum passes through a filter bank of mel-spaced triangular filters, and the filter output energies are log-compressed and transformed to the cepstral domain by the OCT. The spacing of filter bank center frequencies mimics the known warped-frequency characteristics of the human auditory system, yet the bandwidths of these filters is not chosen through biological inspiration. Instead they are set by aligning endpoints of the triangle, which is itself an arbitrary shape. It is surprising that for such a popular speech recognition front end, proper analysis or optimization of the filter bandwidths has not been performed. With complex cochlear models, realistic filter shapes that more closely approximate critical bands are used. And these filters, compared to the filters used in mfcc, are considerably wider and overlap with neighboring filters more. We have extended this filter characteristic to the mfcc algorithm and found that the increased filter bandwidth improves recognition performance in clean speech and provides added noise robustness as well.