DocumentCode
3089309
Title
Learned features versus engineered features for semantic video indexing
Author
Budnik, Mateusz ; Gutierrez-Gomez, Efrain-Leonardo ; Safadi, Bahjat ; Quenot, Georges
Author_Institution
LIG, Univ. Grenoble Alpes, Grenoble, France
fYear
2015
fDate
10-12 June 2015
Firstpage
1
Lastpage
6
Abstract
In this paper, we compare “traditional” engineered (hand-crafted) features (or descriptors) and learned features for content-based semantic indexing of video documents. Learned (or semantic) features are obtained by training classifiers for other target concepts on other data. These classifiers are then applied to the current collection. The vector of classification scores is the new feature used for training a classifier for the current target concepts on the current collection. If the classifiers used on the other collection are of the Deep Convolutional Neural Network (DCNN) type, it is possible to use as a new feature not only the score values provided by the last layer but also the intermediate values corresponding to the output of all the hidden layers. We made an extensive comparison of the performance of such features with traditional engineered ones as well as with combinations of them. The comparison was made in the context of the TRECVid semantic indexing task. Our results confirm those obtained for still images: features learned from other training data generally outperform engineered features for concept recognition. Additionally, we found that directly training SVM classifiers using these features does significantly better than partially retraining the DCNN for adapting it to the new data. We also found that, even though the learned features performed better that the engineered ones, the fusion of both of them perform significantly better, indicating that engineered features are still useful, at least in this case.
Keywords
convolution; document image processing; feature extraction; image classification; indexing; neural nets; support vector machines; video retrieval; video signal processing; DCNN; SVM classifiers; TRECVid semantic indexing task; classification scores; classifiers training; content-based semantic indexing; deep convolutional neural network; hand-crafted descriptors; hand-crafted features; learned features; semantic features; semantic video indexing; still images; traditional engineered features; vector; video documents; Feature extraction; Histograms; Indexing; Semantics; Training; Training data; Visualization;
fLanguage
English
Publisher
ieee
Conference_Titel
Content-Based Multimedia Indexing (CBMI), 2015 13th International Workshop on
Conference_Location
Prague
Type
conf
DOI
10.1109/CBMI.2015.7153637
Filename
7153637
Link To Document