Volume 17, Number 1

Artificial Intelligence and Machine Learning Algorithms Are Used to Detect and Prevent Cyber Threats as Well as Their Potential Impact on the Future of Cybersecurity Practices

  Authors

Jiawei Zhang , Xin Zhang and Xinyin Miao , USA

  Abstract

This article presents an innovative approach that combines quantified topic modelling results with TF-IDF based token features as a hybrid method for text classification. By integrating both techniques, the model is able to capture the contextual meaning of article text and improve overall classification performance. The hybrid approach quantifies the semantic meaningful words from article abstract, applies K-means clustering to group topics and then uses a gradient boosting model for the final classification task. Using five topics derived from a corpus of 750 articles, the proposed method improved the classification F1 score from 0.9121 to 0.9310 and accuracy score from 0.9115 to 0.9292. This approach offers a promising solution for long-form article classification scenarios, where topic modelling helps capture the core semantic meaning of paragraphs while complementing individual quantitative token features.

  Keywords

NLP, Text Classification, Topic Modelling, Machine Learning, Gradient Boosting