DocumentCode :
2575168
Title :
Scrapy-Based Crawling and User-Behavior Characteristics Analysis on Taobao
Author :
Wang, Jing ; Guo, Yuchun
Author_Institution :
Sch. of Electron. & Inf. Eng., Beijing Jiaotong Univ., Beijing, China
fYear :
2012
fDate :
10-12 Oct. 2012
Firstpage :
44
Lastpage :
52
Abstract :
The widespread use of Internet provides a good environment for e-commerce. Study on e-commerce network characteristics always focuses on the Taobao. So far, researches based on Taobao are related to credit rating system, marketing strategy, analysis of characteristics of the seller and so on. The purpose of all these studies is to analyze online marketing transactions in e-commerce. In this paper, we analyze e-commerce network from the perspective of graph theory. Our contributions lie in two aspects as following: (1) crawl Taobao share-platform using Scrapy crawl architecture. After analyzing format of web pages in Taobao deeply, combined with the BFS and MHRW two kinds of sampling methods, we ran crawler on five PCs for 30 days. Besides, we list some big problems encountered in the crawling process, then give the final solution. In addition, we crawled one type of sellers´ data in order to analyze relationships between sellers and buyers. (2) Analyze characteristics of users´ behavior in Taobao share-platform based on obtained dataset. We intend to find the relationships between sellers and buyers connected by items in share-platform. Surprisingly, we find that share-platform is a tool for some buyers to advertise items for sellers who have high credit score, and other buyers only to help them to support the platform.
Keywords :
Internet; electronic commerce; marketing; sampling methods; BFS; Internet; MHRW; Scrapy crawl architecture; Web pages; crawl Taobao share-platform; credit rating system; e-commerce network characteristics; graph theory; marketing strategy; online marketing transactions; sampling methods; scrapy-based crawling; share-platform; user-behavior characteristics analysis; Communities; Crawlers; Engines; Marketing and sales; Sampling methods; Social network services; Web pages; MHRW; Scrapy; Taobao; bipartite graph; sampling method; user behavior;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2012 International Conference on
Conference_Location :
Sanya
Print_ISBN :
978-1-4673-2624-7
Type :
conf
DOI :
10.1109/CyberC.2012.17
Filename :
6384943
Link To Document :
بازگشت