Title :
Challenge Huawei challenge: Fusing multimodal features with deep neural networks for Mobile Video Annotation
Author :
Jian Tu ; Zuxuan Wu ; Qi Dai ; Yu-Gang Jiang ; Xiangyang Xue
Author_Institution :
Sch. of Comput. Sci., Fudan Univ., Shanghai, China
Abstract :
We participated in the Huawei Accurate and Fast Mobile Video Annotation Challenge (MoVAC) at IEEE ICME 2014. Three result runs were submitted by combining different features and classification techniques, with emphasis on both accuracy and efficiency. In this paper, we briefly summarize the techniques used in our system, and the components used for generating each of the three submitted results. One novel component in our system is a specially tailored deep neural network (DNN) that can explore the relationships of multiple features for improved annotation performance, which is very efficient based on an implementation with the GPU. Only 18.8 seconds were needed by one of our DNN-based submissions to process a test video. By combining the DNN with the traditional SVM learning, we achieved the best accuracy across all the worldwide submissions to this challenge.
Keywords :
image classification; image fusion; learning (artificial intelligence); mobile computing; neural nets; support vector machines; video signal processing; DNN; Huawei challenge; IEEE ICME 2014; MoVAC; SVM learning; deep neural network; mobile video annotation challenge; multimodal feature fusion; video classification; Accuracy; Feature extraction; Mel frequency cepstral coefficient; Support vector machines; Training; Vectors; Visualization; Video annotation; deep neural network; multimodal features;
Conference_Titel :
Multimedia and Expo Workshops (ICMEW), 2014 IEEE International Conference on
Conference_Location :
Chengdu
DOI :
10.1109/ICMEW.2014.6890609