• DocumentCode
    319600
  • Title

    Estimation of speaker position using audio information

  • Author

    Vahedian, Abedin ; Frater, Michael ; Arnold, John ; Cavenor, Mike ; Godara, Lal ; Pickering, Mark

  • Author_Institution
    Sch. of Electr. Eng., New South Wales Univ., Canberra, NSW, Australia
  • Volume
    1
  • fYear
    1997
  • fDate
    4-4 Dec. 1997
  • Firstpage
    181
  • Abstract
    Real-time conversational video telecommunications services, such as video-conferencing, are becoming ever more important as a substitute for face-to-face meetings. One of the perceived weaknesses of existing services is the picture quality achieved, especially around the face of a speaker. A possible solution would be to identify the location of face, which is then transmitted at a higher quality than the rest of the picture. In this paper, we present a new technique for identifying the face using an array of microphones. As opposed to other techniques proposed so far, which make assumptions about the content of the video material, the idea relies on the estimation of lip position based on the audio processing from the speaker´s speech. Once this estimation is performed, then a two or possibly three stage quantisation on video information will facilitate the compression of the subjectively more important parts, i.e. the face of a speaker with lower distortion. This new technique, which is compatible with all existing video compression standards, is much cheaper and easier to implement than previous techniques.
  • Keywords
    acoustic signal processing; acoustic transducer arrays; array signal processing; data compression; direction-of-arrival estimation; microphones; teleconferencing; video coding; audio information; audio processing; compression; distortion; face; lip position; microphones; picture quality; quantisation; real-time conversational video telecommunications services; speaker position; video compression standards; video information; video material; videoconferencing; Australia; Lips; Loudspeakers; Microphone arrays; Psychology; Quantization; Speech processing; Telecommunication services; Video compression; Videoconference;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    TENCON '97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications., Proceedings of IEEE
  • Conference_Location
    Brisbane, Qld., Australia
  • Print_ISBN
    0-7803-4365-4
  • Type

    conf

  • DOI
    10.1109/TENCON.1997.647287
  • Filename
    647287