Volume 13, Number 3

Data Standardization using Deep Learning for Healthcare Insurance Claims


Kaelan Renault and Amr R. Abdel-Dayem, Laurentian University, Canada


This paper presents a deep learning model that can be used for data standardization tasks. With applications such as insurance processing, accounting, and government forms processing – being able to standardise data that is presented in a nonstandard format would be impactful across many industries. Restaurant receipt images from CORD dataset were used to build the model. These images were previously processed with OCR then pre-processed to create JSON files that contain the OCR'ed data and metadata of the information on those images. The main challenge in building the model is that the size of the data set is very small (1,000 images). In order to overcome this challenge, an augmentation stage was employed to generate more training samples out of the existing ones. While this standardization problem can be modelled as a classification task, it has been decided to attempt using a regression model that can predict the total on a receipt.


Deep Learning, Machine Learning, Data Standardisation, Regression Model, Data Augmentation, Healthcare Insurance.