• DocumentCode
    3096345
  • Title

    Segmenting acoustic signal with articulatory movement using Recurrent Neural Network for phoneme acquisition

  • Author

    Kanda, Hisashi ; Ogata, Tetsuya ; Komatani, Kazunori ; Okuno, Hiroshi G.

  • Author_Institution
    Grad. Sch. of Inf., Kyoto Univ., Kyoto
  • fYear
    2008
  • fDate
    22-26 Sept. 2008
  • Firstpage
    1712
  • Lastpage
    1717
  • Abstract
    This paper proposes a computational model for phoneme acquisition by infants. Human infants perceive speech sounds not as discrete phoneme sequences but as continuous acoustic signals. One of critical problems in phoneme acquisition is the design for segmenting these continuous speech sounds. The key idea to solve this problem is that articulatory mechanisms such as the vocal tract help human beings to perceive speech sound units corresponding to phonemes. That is, the ability to distinguish phonemes is learned by recognizing unstable points in the dynamics of continuous sound with articulatory movement. We have developed a vocal imitation system embodying the relationship between articulatory movements and sounds produced by the movements. To segment acoustic signal with articulatory movement, we apply the segmenting method to our system by recurrent neural network with parametric bias (RNNPB). This method determines the multiple segmentation boundaries in a temporal sequence using the prediction error of the RNNPB model, and the PB values obtained by the method can be encoded as kind of phonemes. Our system was implemented by using a physical vocal tract model, called the Maeda model. Experimental results demonstrated that our system can self-organize the same phonemes in different continuous sounds. This suggests that our model reflects the process of phoneme acquisition.
  • Keywords
    acoustic signal processing; recurrent neural nets; speech processing; Maeda model; acoustic signal segmentation; articulatory movement; continuous acoustic signals; discrete phoneme sequences; human infants; parametric bias; phoneme acquisition; recurrent neural network; speech sounds; vocal tract model; Acoustics; Humans; Pediatrics; Recurrent neural networks; Shape; Silicon; Speech;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Robots and Systems, 2008. IROS 2008. IEEE/RSJ International Conference on
  • Conference_Location
    Nice
  • Print_ISBN
    978-1-4244-2057-5
  • Type

    conf

  • DOI
    10.1109/IROS.2008.4651060
  • Filename
    4651060