DocumentCode :
1788947
Title :
Tags and titles of videos you watched tell your gender
Author :
Tingting Feng ; Yuchun Guo ; Yishuai Chen ; Xiaoying Tan ; Ting Xu ; Baijun Shen ; Wei Zhu
Author_Institution :
Beijing Jiaotong Univ., Beijing, China
fYear :
2014
fDate :
10-14 June 2014
Firstpage :
1837
Lastpage :
1842
Abstract :
In online video systems, viewer demographic information (gender, age, etc.) is of huge commercial value for delivering targeted advertising and video recommendations, but generally not available directly. This paper targets inferring viewers´ gender based on implicit watching history in the large-scale online video systems. To tackle the sparsity problem without filtering out any cold users or videos, we not only introduce video tags as features, but also use an efficient Chinese word segmentation method to extract hot key-words from video titles as features. Moreover, users´ viewing behavior distribute lognormally, hence we apply a logarithmic transformation on the inference matrixes and further find key features via principal components analysis (PCA). We then solve the gender inference as a classification problem and define some modified evaluation metrics adapt to the imbalance classification problem. We compare a set of classifiers including Class prior, EM, SVM, Logistic regression, Partially supervised soft-label and belief-based mixture and find that Logistic regression is the best. The inference results show that our algorithms can obtain high F̃1 values for all classes. The highest value of PPTV dataset can reach nearly 0.75. And inference based on key-words results in a 14.63% increase of F̃1 contrast to the ratings of MovieLens.
Keywords :
Internet; advertising; gender issues; pattern classification; principal component analysis; regression analysis; text analysis; Chinese word segmentation method; EM classifiers; MovieLens ratings; SVM classifiers; belief-based mixture; class prior classifiers; evaluation metrics; gender inference; imbalance classification problem; inference matrices; key-word extraction; large-scale online video systems; logarithmic transformation; logistic regression; partially supervised soft-label; principal component analysis; targeted advertising; user viewing behavior; video recommendations; video tags; video titles; viewer demographic information; Dictionaries; Feature extraction; Logistics; Measurement; Motion pictures; Principal component analysis; Videos; Logarithmic scale; Logistic regression; PCA; key words; tags;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communications (ICC), 2014 IEEE International Conference on
Conference_Location :
Sydney, NSW
Type :
conf
DOI :
10.1109/ICC.2014.6883590
Filename :
6883590
Link To Document :
بازگشت