Volume 13, Number 2

Forged Character Detection Datasets: Passports, Driving Licences and Visa Stickers


Teerath Kumar1, Muhammad Turab2, Shahnawaz Talpur2, Rob Brennan1 and Malika Bendechache1, 1Dublin City University, Ireland, 2Mehran University of Engineering and Technology, Pakistan


Forged documents specifically passport, driving licence and VISA stickers are used for fraud purposes including robbery, theft and many more. So detecting forged characters from documents is a significantly important and challenging task in digital forensic imaging. Forged characters detection has two big challenges. First challenge is, data for forged characters detection is extremely difficult to get due to several reasons including limited access of data, unlabeled data or work is done on private data. Second challenge is, deep learning (DL) algorithms require labeled data, which poses a further challenge as getting labeled is tedious, time-consuming, expensive and requires domain expertise. To end these issues, in this paper we propose a novel algorithm, which generates the three datasets namely forged characters detection for passport (FCD-P), forged characters detection for driving licence (FCD-D) and forged characters detection for VISA stickers (FCD-V). To the best of our knowledge, we are the first to release these datasets. The proposed algorithm starts by reading plain document images, simulates forging simulation tasks on five different countries' passports, driving licences and VISA stickers. Then it keeps the bounding boxes as a track of the forged characters as a labeling process. Furthermore, considering the real world scenario, we performed the selected data augmentation accordingly. Regarding the stats of datasets, each dataset consists of 15000 images having size of 950 x 550 of each. For further research purpose we release our algorithm code 1 and, datasets i.e. FCD-P 2 , FCD-D 3 and FCD-V 4.


Character detection dataset, Deep learning forgery, Forged character detection.