Abstract :
Recommendation Engines have gained the most attention in the Big Data world. In order to promote the application of big data, AlibabaGrouporganizedthebig data recommendation competition, which provides the big data processing platform and one billion behavior records to participants. The competition requires the participants to learn the model from the user´s behaviors within one month and then predict the purchase behavior in the following day. There are four kinds of different behaviors included: browse, add-to-cart, collection and purchase. The F1-score is as the metric to evaluate the performance. Finally, our team achieves the top score of 8.78%, and our success can be owed to the following aspects: First, we model the recommendation problem as the binary classification problem and design the hierarchical model, Second, in order to improve performance of single classifier, we adopt the sample filtering strategy to select valuable samples for training, which not only boosts the performance but also speeds up the training, Third, the classifier fusion strategy is used to improve the final performance. This paper details our hierarchical model and some relevant key technologies adopted for this competition. This hierarchical model is also the framework of data processing, which is composed of four layers: 1) Sample filtering layer, which removes a large number of invaluable samples and reduces the computing complexity, 2) Feature extraction layer, which extracts extensive features so as to characterize the samples from all possible views, 3) Classifying layer, which trains several classifiers by different sampling strategy and feature groups, 4) Fusion layers, which fuses the results of different classifiers to obtain the better one. Our score in competition manifests the reasonableness and feasibility of our model.
Keywords :
"Feature extraction","Big data","Training","Filtering","Algorithm design and analysis","Data models","Computational modeling"