Botnet detection based on network flows using inverse statistics
botnet, network flow, anomaly detection, inverse statistics
otnet is a network of infected computers, which are remotely controlled by a cyber-
criminal, called botmaster, whose objective is to carry out massive cyber attacks, such
as DDoS, SPAM and information theft. Traditional botnet detection methods, usually
signature-based, are unable to detect unknown botnets. Behavior-based analytics has held
promise for detecting current botnet trends, which are constantly evolving. Considering
that Botnet attacks on the IT infrastructure of the Brazilian Army’s Mobile Operations
Coordination Center (CCOp Mv) may harm the success of operations, through theft of
sensitive information or even causing interruption to critical CCOp Mv systems, this dis-
sertation proposes a botnet detection mechanism based on network flow behavior analisys.
The main objective is to propose an additional layer of cyber protection to the CCOp
Mv IT infrastructure. The technique used to detect botnets was recently developed and
it is called Energy-based Flow Classifier (EFC). This technique uses inverse statistics to
detect anomalies and has an important characteristic which is its easy adaptation to new
domains. Due to this characteristic, EFC is a promising technique for detecting unknown
botnets. EFC uses only benign data to infer the detection model and classifies as mali-
cious any flow that deviates from the normal traffic pattern learned during model training.
Thus, in addition to flows related to botnet activities, other types of malicious activities
may be detected, making further verification necessary to identify the type of malicious
activity detected. Two heterogeneous datasets, CTU-13 and ISOT HTTP were used to
evaluate the efficiency of the model and the results were compared with several traditional
algorithms. Preliminary results show that EFC presented good results when tested in the
same domain and the tests performed in different domains show that the EFC manages
to maintain stable results, regardless of the domain, unlike the other models tested.