Title :
Simultaneous multispeaker segmentation for automatic meeting recognition
Author :
Laskowski, Kornel ; Fugen, Christian ; Schultz, Tanja
Author_Institution :
interACT, Univ. Karlsruhe, Karlsruhe, Germany
Abstract :
Vocal activity detection is an important technology for both automatic speech recognition and automatic speech understanding. In meetings, participants typically vocalize for only a fraction of the recorded time, and standard vocal activity detection algorithms for close-talk microphones have shown to be ineffective. This is primarily due to the problem of crosstalk, in which a participant´s speech appears on other participants´ microphones, making it hard to attribute detected speech to its correct speaker. We describe an automatic multichannel segmentation system for meeting recognition, which accounts for both the observed acoustics and the inferred vocal activity states of all participants using joint multi-participant models. Our experiments show that this approach almost completely eliminates the crosstalk problem. Recent improvements to the baseline reduce the development set word error rate, achieved by a state-of-the-art multi-pass speech recognition system, by 62% relative to manual segmentation. We also observe significant performance improvements on unseen data.
Keywords :
speaker recognition; automatic meeting recognition; automatic multichannel segmentation system; automatic speech recognition; automatic speech understanding; multispeaker segmentation; vocal activity detection; word error rate; Acoustics; Crosstalk; Manuals; Microphones; Silicon; Speech; Training;
Conference_Titel :
Signal Processing Conference, 2007 15th European
Conference_Location :
Poznan
Print_ISBN :
978-839-2134-04-6