Integrating Natural Language Processing with AdaBoost, Random Forest, and Logistic Regression for an Advanced Ensemble-Based Network Intrusion Detection Model

Main Article Content

Putta Srivani, Himanshu Sharma, Rabins Porwal, T. Nagalakshmi, P.Mercy, Mallareddy Adudhodla, Nargis Parveen

Abstract

Higher numbers and more complicated traffic data passing over your network requires that you have some advanced ways to properly identify security threats. In this paper, we design an ensemble intrusion detection system that unites different machine learning strategies such as due to Natural Language Processing (NLP), AdaBoost, Random Forest and Logistic Regression to identify several kinds of network intrusions. Problem: To deal with more and more stronger attacks, detect rate of the traditional Intrusion Detection Systems remains feckless. Based on the results of this analysis, the important features are extracted from network traffic logs by applying NLP and feed it to espresso classifier in the ensemble approach. Because AdaBoost increases the performance of weak learners, Random Forest provides robustness and Logistic Regression provides interpretability in final decision making. It is trained and tested on benchmark datasets like the NSL-KDD, and CICIDS2017. Overall, in the case results ensemble models were better performance than all individual classifiers regarding accuracy, precision and recall especially in rare case of attack types. Using NLP for feature extraction has enabled the detection of sophisticated attack signatures, showing that this model performs well for real-time security monitoring in high throughput networks like corporate or cloud environments.

Article Details

Section
Articles