Volume 15, Number 1

Phishing URL Detection using LSTM Based Ensemble Learning Approaches

  Authors

Bireswar Banik and Abhijit Sarma, Gauhati University, India

  Abstract

Increasing incidents of phishing attacks tempt a significant challenge for cybersecurity personals. Phishing is a deceitful venture with an intention to steal confidential information of an organization or an individual. Many works have been performed to build anti-phishing solutions over the years, but attackers are coming with new manoeuvres from time to time. Many of the existing techniques are experimented based on limited set of URLs and dependent on other software to collect domain related information of the URLs. In this paper, with an aim to build a more accurate and effective phishing attack detection system, we used the concept of ensemble learning using Long Short-Term Memory (LSTM) models. We proposed ensemble of LSTM models using bagging approach and stacking approach. For performing classification using LSTM method, no separate feature extraction is done. Ensemble models are built integrating the predictions of multiple LSTM models. Performances of proposed ensemble LSTM methods are compared with five different machine learning classification methods. To implement these machine learning algorithms, different URL based lexical features are extracted. Mutual Information based feature selection algorithm is used to select more relevant features to perform classifications. Both the bagging and the stacking approaches of ensemble learning using LSTM models outperform other machine learning techniques. The results are compared with other anti-phishing solutions implemented using deep learning methods. Our approaches have proved to be the more accurate one with a low false positive rate of less than 0.15% performed comparatively on a larger dataset.

  Keywords

Cyber security, Phishing attack, Machine learning, LSTM, Ensemble learning.