Classification of Network Traffic using Machine Learning Models on the NetML Dataset

Volume 17, Number 3

Classification of Network Traffic using Machine Learning Models on the NetML Dataset

Authors

Mezati Messaoud, Kasdi Merbah University, Algeria

Abstract

Network traffic classification plays a critical role in cybersecurity, quality of service (QoS) management, and anomaly detection. Traditional rule-based classification methods struggle with the increasing complexity and volume of network traffic, necessitating the adoption of machine learning (ML) techniques. In this study, we explore the effectiveness of ML models in classifying network traffic using the NetML dataset, a benchmark dataset that captures diverse traffic patterns, including benign and malicious activities. We preprocess the dataset by applying feature selection, normalization, and data balancing techniques to optimize model performance. Several ML models, including traditional classifiers such as Random Forest (RF), Support Vector Machines (SVM), and K-Nearest Neighbors (KNN), as well as deep learning models such as Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks, are trained and evaluated. Model performance is assessed using accuracy, precision, recall, F1- score, and AUC-ROC metrics. Experimental results demonstrate that deep learning models, particularly LSTM networks, achieve superior performance in capturing temporal dependencies in network traffic, significantly outperforming traditional classifiers. Our results indicate that LSTM, GRU, and CNN models all achieved an accuracy of 92.26%, highlighting their effectiveness in network traffic classification. Additionally, feature selection techniques improved computational efficiency without compromising classification performance. However, confusion matrix analysis revealed that the models tend to predict the most frequent class, leading to potential bias and lower accuracy for minority classes. The study also highlights the presence of high values in the confusion matrices, exceeding 70,000 in some cases, indicating dataset imbalance and model bias toward dominant classes. Despite achieving high accuracy, misclassification challenges persist, particularly in identifying encrypted traffic and polymorphic attacks. Transformer-based models demonstrated resilience to adversarial modifications but required significantly higher computational resources. Future work should explore adversarial training, self-supervised learning, and hybrid CNN-LSTM architectures to enhance robustness against evolving cyber threats. Additionally, feature selection optimization and hyperparameter tuning can further refine classification performance, ensuring more reliable deployment in real-world cybersecurity applications.

Keywords

Machine Learning, Network Traffic Classification, NetML Dataset, Deep Learning, Cybersecurity

Archives