DocumentCode :
3670176
Title :
Multimodal object recognition from visual and audio sequences
Author :
Weipeng He;Haojun Guan;Jianwei Zhang
Author_Institution :
TAMS, Department of Informatics, University of Hamburg, Vogt-Kö
fYear :
2015
Firstpage :
133
Lastpage :
138
Abstract :
This paper describes a visual-audio object recognition system using hidden Markov models. The system uses the bag-of-words model with scale invariant feature transform descriptors as the visual feature and the mel-frequency cepstrum coefficients as the audio feature. The classification of objects is based on the computation of the probabilities with learned hidden Markov models. Two different fusion methods are used in the system: feature fusion and decision fusion. The former method learns a joint probability distribution with one HMM, while the latter method learns two separate distributions for each modality and combines them under the conditional independence assumption. Experiments based on a dataset of 33 different household objects are carried out to evaluate the performance of these two fusion methods as well as unimodal approaches. The result shows that both fusion methods outperform unimodal methods, while these two methods are mostly comparable.
Keywords :
"Hidden Markov models","Visualization","Object recognition","Joints","Feature extraction","Videos","Covariance matrices"
Publisher :
ieee
Conference_Titel :
Multisensor Fusion and Integration for Intelligent Systems (MFI), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/MFI.2015.7295798
Filename :
7295798
Link To Document :
بازگشت