Abstract
The main data-driven techniques for detecting cybersecurity attacks are based on the analysis of network traffic data and/or of application/system logs (stored in a host or in some other kind of device). A wide range of machine-learning techniques (and possible alternative configurations of them) have been proposed in the literature so far, for this purpose, but none of them has been proven to consistently overcome the others across different datasets. In order to ensure better accuracy and stability, the ensemble paradigm can be exploited as an effective solution for combining such techniques. However, as attack detection problems are hard to cope with and, usually, entail the analysis of large and fast streams of data, different types of ensemble (and of base algorithms composing the ensemble) should have experimented, exploiting distributed architecture to suitably reduce the high-execution times necessary to run them. In order to handle all these issues, a p2p environment to validate ensemble-based approaches in the cybersecurity domain is proposed in this paper. Two case studies are analyzed by using this framework, which concern the detection of intrusions in network-traffic data and of deviant process instances. Preliminary scalability results demonstrate that the framework is a viable solution for these challenging kind of problems.