• DocumentCode
    1688805
  • Title

    Channel-mapping for speech corpus recycling

  • Author

    Ichikawa, Osamu ; Rennie, Steven J. ; Fukuda, Toshio ; Nishimura, M.

  • Author_Institution
    IBM Res. - Tokyo, Tokyo, Japan
  • fYear
    2013
  • Firstpage
    7160
  • Lastpage
    7164
  • Abstract
    The performance of automatic speech recognition (ASR) is heavily dependent on the acoustic environment in the target domain. Large investments have focused on ways to record speech data in specific environments. In contrast, recent Internet services using hand-held devices such as smartphones have created opportunities to acquire huge amounts of “live” speech data at low cost. There are practical demands to reuse this abundant data in different acoustic environments. To transform such source data for a target domain, developers can use channel mapping and noise addition. However, channel mapping of the data is difficult without stereo mapping data or impulse response data. We tested GMM-based channel mapping with a vector Taylor series (VTS) formulation on a per-utterance basis. We found this type of channel mapping effectively simulated our target domain data.
  • Keywords
    Internet; recycling; speech recognition; ASR; GMM; Internet service; VTS formulation; acoustic environment; automatic speech recognition; channel mapping; hand-held device; impulse response data; smartphone; speech corpus recycling; speech data recording; stereo mapping data; target domain data simulation; vector Taylor series formulation; Acoustics; Adaptation models; Data models; Hidden Markov models; Noise; Speech; Speech recognition; Speech recognition; channel normalization; feature adaptation; noise reduction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2013.6639052
  • Filename
    6639052