Anomaly Detection in IT Systems Log Files Using Unsupervised Algorithms
anomaly detection, log, unsupervised learning
Information Technology (IT) systems traditionally record their activities in log files, which are often used for troubleshooting. However, manual analysis of these logs by system administrators often becomes impractical due to their intrinsic complexity and the high volume of data. In this study, we focus our investigation on anomaly detection in IT log records, aiming to automate the identification of the root cause of failures and vulnerabilities through the use of unsupervised Machine Learning techniques. To achieve this goal, we propose an architecture grounded in the literature to identify anomalies in log files using semantic vectorization. We conducted four experiments using the public Blue Gene/L (BGL) log dataset, where we evaluated the performance of eight unsupervised Machine Learning models. Additionally, we tested various configurations of word embeddings in semantic vectorization. Experimental results indicated that Deep Learning models, Self-Organizing Maps, and Autoencoders performed better, making them more suitable for practical real-world application. The main contributions of this work include the selection and testing of unsupervised Machine Learning models, followed by performance evaluation in complex environments. We also highlight the importance of practical applicability, exemplified by the proposed implementation for the second evaluation scenario, which uses logs from Microsoft Configuration Manager agents. This study not only presents advanced solutions but also emphasizes the need to consider the feasibility and effectiveness of these solutions in real-world scenarios, opening perspectives for future investigations.