Vishing Detection System using Text-Based Analysis to Prevent Voice Phishing Scams

Volume 18, Number 2

Vishing Detection System using Text-Based Analysis to Prevent Voice Phishing Scams

Authors

Norhidayah Muhammad, Nor Aida Akma Alias, Siti Dhalila Mohd Satar, Nazirah Abd Hamid and Mumtazimah Mohamad, Universiti Sultan Zainal Abidin, Malaysia

Abstract

Vishing or voice phishing is another form of cyber-attack that has remained a challenge for quite a while now. Scammers employ psychological manipulation to circumvent conventional security defense systems. The number of incidents involving fraud through telecommunication in Malaysia has increased tremendously, with the Malaysian Police Force reporting an increase of 47.3% in the number of cases recorded from 2022 to 2023. These fraud incidences have made substantial contributions towards the loss of RM1.2 billion according to records by Commercial Crime Investigation Department (CCID) Malaysia. In addition, statistics show that over 60% of the fraudulent cases are carried out through sophisticated social engineering techniques, which are not detected by the traditional network filters. Modern security defense systems have proved inadequate since they depend on a blacklist of numbers that is outdated. This project builds an automated system for the identification of vishing based on the analysis of two major components: acoustic features (speech sounds) and text (message content). By using genuine data collected from both YouTube and TikTok platforms, audio patterns are extracted using Mel-Frequency Cepstral Coefficients (MFCC) analysis, while transcription is performed using OpenAI Whisper for text-based analysis. There are two different classifiers of Naïve Bayes which are used independently: one classifier uses acoustic features, whereas the other uses transcriptions from the text. The combination of both classifiers is done using weights in such a way that text gets 80%, whereas acoustic features get 20% weightage. The method classifies each call as either a fraud, suspicious call, or genuine one, and its performance is measured based on parameters such as accuracy, precision, recall, and F1-Score. The findings from an experiment conducted using a sample set of 32 audio recordings (17 fraud and 15 genuine) revealed an accuracy of 96.87%, precision rate of 100%, recall of 94.12%, and F1-score of 97%. It shows the feasibility of using the developed hybrid method despite its constraints including the small amount of training data.

Keywords

Vishing Detection, Mel-Frequency Cepstral Coefficients (MFCC), Naïve Bayes classifier, OpenAI Whisper, Multimodal classification, Social Engineering