FRAMEWORK FOR TTP CLASSIFICATION BASED ON BERT TRANSFORMERS
Natural Language Processing; Cyber Intelligence; Tactics, Techniques and Procedures; Machine Learning
Information upon Tactics, Techniques and Procedures (TTP) observed in an attack are important to cybersecurity defenders. However, they are mostly disseminated through unstructured text, hindering access and the job of ciberanalysts. This work presents a framework for tackling this problem by using BERT (Bidirectional Encoder Representations from Transformers), a model derived from the Transformers Architecture. We use 11 variants of BERT, a state-of-theart approach in Natural Language Processing, to classify sentences according to MITRE ATT\&CK framework for TTP. The dataset used is MITRE's database of sentences (examples) and part of it is used in training and part in the models evaluation. Validation is also done against a set of manually annotated sentences extracted from public CTI reports. The effect of some chosen hyperparameters on the fine-tuning of the models are also investigated. The purpose is to identify the best model and the finest combination of hyperparameters for the proposed classification task. As a result, we observed that the best models presented an accuracy of 82.64\% and 78.75\% on the two datasets tested, demonstrating the feasibility and potential of the application of BERT models in the complex task of TTP classification. At last, we analyze some of the sentences misclassified by the framework to better understand why the models are missing and thus gather insights about possibilites to further improve performance.