Machine Learning Models for Cybersecurity Risk Analysis in U.S. Critical Infrastructure Systems

Main Article Content

Md Habibul Arif, Habibor Rahman Rabby, Nusrat Yasmin Nadia, Md Zahid Hassan, Rasel Hossain Babu

Abstract

The growing sophistication and frequency of cyber attacks on critical infrastructure in the United States make it increasingly important for robust cybersecurity tools to be put into place. In this paper, we study Artificial Intelligence (AI) for the protection of critical infrastructure, focusing on the use of machine learning (ML) models for attack detection. The analysis is based on a data set that contains network session attributes such as network packet size (mean: 500.43, standard deviation: 198.38), login (mean: 4.03, standard deviation: 1.96), session duration (mean: 792.75, standard deviation: 786.56), IP reputation score (mean: 0.33, standard deviation: 0.18), failed logins (mean: 1.52, standard deviation: 1.03), and odd time access (mean: 0.15, standard deviation: 0.36). Logistic Regression, Random Forest, Support Vector Machine (SVM), and XG Boost were the four machine learning models to use for the evaluation of the dataset. Model performance was measured through various evaluation metrics such as accuracy, precision, recall, F1 score, and Area Under the Curve (AUC). The final results demonstrated that XG Boost realized the best performance, with a maximum test accuracy of 88.52% and an AUC value of 0.88 compared to the other models. Random Forest obtained the second maximum test accuracy of 88.41% and an AUC of 0.88. Logistic Regression had the worst performance testing accuracy 73.11%, AUC 0.79. Furthermore, model calibration analysis using calibration curves showed that XG Boost overconfidently predictions for attacks, and the SVM and Logistic Regression were well-calibrated, albeit slightly underestimating attacks. From the feature distribution analysis, it was found that login attempts and session duration were the most important factors for the separation of attack and non-attack cases. This research shows that machine-learning-capable AI-driven cybersecurity solutions are very effective in protecting critical infrastructure, and insights are provided to help improve cybersecurity resilience using machine learning solutions. In the future, deploying more advanced technologies and tackling the regulatory barriers to adopting AI should be the focus of further research.

Article Details

Section
Articles