• DocumentCode
    1297260
  • Title

    Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera

  • Author

    Hori, Takaaki ; Araki, Shoko ; Yoshioka, Takuya ; Fujimoto, Masakiyo ; Watanabe, Shinji ; Oba, Takanobu ; Ogawa, Atsunori ; Otsuka, Kazuhiro ; Mikami, Dan ; Kinoshita, Keisuke ; Nakatani, Tomohiro ; Nakamura, Atsushi ; Yamato, Junji

  • Author_Institution
    NTT Commun. Sci. Labs., NTT Corp., Kyoto, Japan
  • Volume
    20
  • Issue
    2
  • fYear
    2012
  • Firstpage
    499
  • Lastpage
    513
  • Abstract
    This paper presents our real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to recognize automatically “who is speaking what” in an online manner for meeting assistance. Our system continuously captures the utterances and face poses of each speaker using a microphone array and an omni-directional camera positioned at the center of the meeting table. Through a series of advanced audio processing operations, an overlapping speech signal is enhanced and the components are separated into individual speaker´s channels. Then the utterances are sequentially transcribed by our speech recognizer with low latency. In parallel with speech recognition, the activity of each participant (e.g., speaking, laughing, watching someone) and the circumstances of the meeting (e.g., topic, activeness, casualness) are detected and displayed on a browser together with the transcripts. In this paper, we describe our techniques and our attempt to achieve the low-latency monitoring of meetings, and we show our experimental results for real-time meeting transcription.
  • Keywords
    cameras; microphone arrays; speaker recognition; speech enhancement; advanced audio processing operation; conversation monitoring; low-latency real-time meeting recognition; microphone array; omnidirectional camera; overlapping speech signal enhancement; real-time meeting analyzer; speaker channel; speech recognition; Browsers; Cameras; Microphones; Real-time systems; Speech; Speech processing; Speech recognition; Distant microphones; meeting analysis; speaker diarization; speech enhancement; speech recognition; topic tracking;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2011.2164527
  • Filename
    5983476