Abstract :
Android applications are widely used and many are ´free´ applications which include advertisement (ad) modules that provide ad services and track user behavior statistics. However, these ad modules often collect users´ personal information and device identification numbers along with usage statistics, which is a violation of privacy. In our analysis of 1,188 Android applications´ network traffic, we identified 797 applications that included 45 previously known ad modules. We analyzed these ad modules´ network behavior, and found that they have characteristic network traffic patterns for acquiring ad content, specifically images. In order to accurately differentiate between ad modules´ network traffic and valid application network traffic, we propose a novel method based on the distance between network traffic graphs mapping the relationships between HTTP session data (such as HTML or Java Script). This distance describes the similarity between the sessions. Using this method, we can detect ad modules´ traffic by comparing session graphs with the graphs of already known ad modules. In our evaluation, we generated 20,903 graphs of applications. We separated the application graphs into those generated by known ad modules (4,698 graphs), those we manually identified as ad modules (2,000 graphs), and standard application traffic. We then applied 1,000 graphs of known ad graphs to the other graph sets (the remaining 3,698 known ad graphs and the 2,000 manually classified ad graphs) to see how accurately they could be used to identify ad graphs. Our approach showed a 76% detection rate for known ad graphs, and a 96% detection rate for manually classified ad graphs.
Keywords :
hypermedia; network theory (graphs); smart phones; transport protocols; Android application; HTTP session data; advertisement module network behavior; graph modelling; network traffic pattern; Androids; Google; HTML; Humanoid robots; Privacy; Servers; Uniform resource locators; Android; Network Security; Privacy; Smartphone;