Keynote speech 4: Extraction of linguistic and paralinguistic information from audio-visual data

Author

Shrikanth Narayanan

Author_Institution

Univ. of Southern California, Los Angeles, CA, USA

fYear

2015

Firstpage

1

Lastpage

2

Abstract

Audio-visual data have been a key enabler of human observational research and practice. The confluence of sensing, communication and computing technologies is allowing capture and access to data, in diverse forms and modalities, in ways that were unimaginable even a few years ago. Importantly, these data afford the analysis and interpretation of multimodal cues of verbal and non-verbal human behavior. These signals carry crucial information about not only a person´s intent and identity but also underlying attitudes and emotions. Automatically capturing these cues, although vastly challenging, offers the promise of not just efficient data processing but in tools for discovery that enable hitherto unimagined insights. Recent computational approaches that have leveraged judicious use of both data and knowledge have yielded significant advances in this regards, for example in deriving rich information from multimodal sources including human speech, language, and videos of visual behavior. This talk will focus on some of the advances and challenges in gathering such data and creating algorithms for machine processing of such cues. It will also introduce some of the freely available data resources for research.

Publisher

ieee

Conference_Titel

Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2015 International Conference

Type

conf

DOI

10.1109/ICSDA.2015.7357854

Filename

7357854