Volume 15, Number 3

PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Polymorphic Treatment of Features

  Authors

S. Chanti, T. Chithralekha, and K. S. Kuppusamy, Pondicherry University, India

  Abstract

Phishing scams are increasing drastically, which affects Internet users in compromising personal credentials. This paper proposes a novel feature utilization method for phishing URL detection called the Polymorphic property of features. In the initial stage, the URL-related features (46 features) were extracted. Later, a subset of features (19 out of 46) with the polymorphic property of features was identified, and they were extracted from different parts of the URL (the domain and path). After extracting the features, various machine learning classification algorithms were applied to build the machine learning model using monomorphic treatment of features, polymorphic treatment of features, and both monomorphic and polymorphic treatment of features. By the polymorphic property of features, we mean that the same feature provides different interpretations when considered in different parts of the URL. The machine learning models were built on two different datasets. A comparison of the machine learning models derived from the two datasets reveals the fact that the model built with both monomorphic and polymorphic treatment of features yielded higher accuracy in Phishing URL detection than the existing works. While testing the model on phishing URL datasets, the most challenging thing we noticed was detecting the phishing URLs with a valid SSL certificate. The existing works on detecting phishing URLs, using only digital certificate-related features, are not up to the mark. We combined certificate-related and URL-related features to improve the performance to address the problem.

  Keywords

Phishing, Anti-Phishing, Non-Content based approach, Monomorphic Features, Polymorphic Features, URL phishing, Credential Stealing