Author :
Clementi, Andreas ; Stelzhammer, Peter ; Colon Osorio, Fernando C.
Abstract :
In the past, several methods have been used to select Malware attack samples, the so-called Stimulus Workload (SW), used in Malware-detection tests of endpoint security products. For example, in the selection process one must be aware that amongst the samples selected, some pose a greater threat to users than others as they are more widespread and hence are more likely to affect a user. Some may target a specific company or user base, but present less risk to other users. Other Malware attack samples may only be found on specific websites, affect specific countries/regions, or only be relevant to particular operating system versions or interface languages (English, German, Chinese, and so forth). Unfortunately, and due to such variability, the selection of samples can and will skew the results dramatically. For this reason, over the last several years, the Security Effectiveness Measurement Community & Ecosystem (SEMCE), has begun the process of adopting a test methodology that requires strict adherence to standards. The primary reason for the adoption of said methodology, first described in [1], is to assure the reproducibility and reliability of test results. These methodology requires that the stimulus workload used must be a reliable/good proxy for the actual environment that the products are expected to encounter in the wild. In this manuscript, we present the results of end-point security protection products effectiveness when the selected stimulus workload (SW) takes into consideration the variabilities such as the ones described above. We called these workloads CSW or Customizable Stimulus Workloads, and our results show great variance as to the effectiveness of end-point products when such CSW´s are used. Our evaluation of end-point security products uses simple metric, namely missed detections. The generation of the CSWs depended heavily on Microsoft´s Global telemetry data gathered in 2013 and 2014 for Microsoft Windows updates. Twenty-two (22) en- -point security products were evaluated using such a methodology. The results obtained show great variability between the miss ratios, meaning the number of Malware samples the product failed to detect versus the customer impact coefficient amongst vendors. For example, two end-point protection products that had similar miss percentages of 0.2 % and 0.4 % showed dramatic customer impact coefficient differences of 0.001209 and 0.018903 respectively. Meaning, that when miss percentages were normalized for factors such as prevalence, Operating System, languages, and so fort, systems protected by one vendor were 18 times more likely to suffer an infection that their counterpart.
Keywords :
customer services; invasive software; program testing; CSW; Microsoft Windows updates; Microsoft global telemetry data; customer impact coefficient; customizable stimulus workloads; end-point protection products; endpoint security product comparative detection testing; global prevalence weighting; local prevalence weighting; malware; missed attack sample impacts; Decision support systems; Malware; Software;