Detección de botnets y ransomware en redes de datos mediante técnicas de aprendizaje automático

Fernandez Maimo, Lorenzo

Detección de botnets y ransomware en redes de datos mediante técnicas de aprendizaje automático

Fernandez Maimo, Lorenzo

Zuzendaria:

Félix J. García Clemente Zuzendaria

Defentsa unibertsitatea: Universidad de Murcia

Fecha de defensa: 2019(e)ko uztaila-(a)k 12

Epaimahaia:

Juan Manuel Estévez Tapiador Presidentea
Pedro Enrique López de Teruel Alcolea Idazkaria
Jorge Maestre Vidal Kidea

Saila:

Ingeniería y Tecnología de Computadores

Mota: Tesia

Teseo: 152113 DIALNET DIGITUM editor

Laburpena

The existing cyberdefense systems based on Intrusion Detection Systems (IDS) include (pro-)active approaches to anticipate and mitigate attacks that exploit vulnerabilities in computing systems. However, there exist environments in which IDS have difficulties in reaching their goal. For example, in the context of mobile communications, the high transmission rates and large data volumes expected in the future 5G technology will prevent actual IDS from examining every packet in the network. Additionally, the use of encrypted traffic is increasingly frequent, preventing payload examination. Two of the most relevant cybersecurity threats are botnets and ransomware. Both of them generate rather characteristic network traffic patterns which can be interpreted as anomalies in the normal network traffic. In general, an anomaly can be defined as a pattern that does not follow an expected behavior considered as normal. The main objective of this doctoral thesis is to research how to use machine learning techniques for anomaly detection in data networks with constraints. These constraints can be motivated, for example, by an enormous traffic volume (5G networks), encrypted traffic (clinical environments), or the requirement of automatic and real-time detection and mitigation, among others. This doctoral thesis argues that one only netflow, without accessing to the packet payload, does not provide sufficient information; therefore, it proposes adding a context to the netflow to allow a more accurate detection. This context will be obtained from the netflows received in a given period of time preceding the netflow in question. By using netflows, the detection must be done with less information; thus, the patterns to be detected will be more complex and it is necessary to utilize machine learning algorithms to identify them. Moreover, this work argues that this netflow evaluation can be done at the rate of the demanding 5G networks, and that the detection/mitigation time can prevent ransomware spread. All this is integrated into a suitable architecture, and it is done in real time, and in a dynamic and intelligent way. In order to achieve these goal, the following methodology has been applied: " Critical analysis of the machine learning-based anomaly detection systems applied to data networks in literature. " Identification of scenarios where anomaly detection is challenging by means of analyzing the feasibility of a netflow-based solution in these contexts. " Thorough study of a selected set of suitable machine learning algorithms for each scenario. " Design of an architecture based on NFV/SDN for each scenario, integrating anomaly detection and mitigation in a dynamic and flexible way, as well as in real time. " Use of an existing public data set appropriate to evaluate the proposal or creation of one to make it available to the scientific community. " Experimental evaluation of the proposed architectures in classification, resource consumption and detection/mitigation time. The main results obtained in the development of this doctoral thesis are listed below. " A novel way of calculating a feature vector associated to a netflow was presented. This feature vector incorporates aggregated information of the preceding netflows received in a time interval to provide a context to this netflow. " An adaptive system based on NFV/SDN was proposed for the detection of anomalies in the context of 5G data networks. Integrated into this system is a detection model based on deep learning at two levels. The lower level runs at the edge of the network, detecting symptoms of anomalies that the upper level uses to identify a potential global anomaly. " Runtime performance measures were obtained by evaluating the implementation of the model at the edge (a deep neural network), with the most popular deep learning development libraries. These measured times were used to demonstrate the adaptability of the proposed 5G architecture. " It was determined that this deep neural network, using the feature vector mentioned above, is capable of detecting both known and unknown botnets. " A second system based on NFV/SDN was introduced, capable of detecting, classifying and mitigating ransomware attacks in the hospital rooms of the future automatically, intelligently and in real time. This system builds on the designed feature vector and incorporates an entire life cycle that includes offline data acquisition and training, along with real-time detection and mitigation. " The effectiveness of this proposal has been shown for detection and mitigation of known and unknown ransomware, through extensive experiments carried out in a virtualized environment. To this end, a new dataset was generated from traffic captured in that environment and has been made available to the scientific community. Our experiments demonstrated that the proposed method is able to avoid the ransomware spread.