Abstract :
The Alibaba mobile recommendation competition aims at predicting the purchase behavior of users with the real users-commodities behavior data on Alibaba´s m-commerce platform. In this paper, we regarded it as binary classification problems for each user/item pair, explored several important factors that closely related to purchase behaviors, extracted useful features from the data set, and made predictions with machine learning models in our experiment. Since some of the location information is missing in the data set, we proposed an algorithm to estimate the missing value of the data, Since purchase behaviors are extremely less than non-purchase behaviors, the data set is typically imbalanced. To deal with this problem, we proposed a novel and effective sampling method to undersampling the majority class. Without any blending strategies, our method achieves good F1-score result with a small training set using a single Gradient Boosting Decision Tree model, which achieved the top 10 result in the final leaderboard of the competition. After the competition, the team who won the third place shared their features to us for the purpose of experiment, which is larger and may avoid the danger of overfitting. With this feature set, our method achieved the state of art result 8.79% with a single model, which outperforms the best result 8.78% in the final leaderboard.
Keywords :
"Training","Sampling methods","Mobile communication","Data models","Predictive models","Conferences","Data mining"