A Hierarchical Vision Approach for Enhanced Medical Diagnostics of Lung
Tuberculosis using Swin TransformerER

doi:10.5121/csit.2023.132209

A Hierarchical Vision Approach for Enhanced Medical Diagnostics of Lung Tuberculosis using Swin Transformer

Authors

Syed Amir Hamza and Alexander Jesser, Heilbronn University of Applied Sciences, Germany

Abstract

Lung tuberculosis remains a significant global health concern, and accurate detection of the disease from chest X-ray images is essential for early diagnosis and treatment. The primary objective is to introduce a cutting-edge approach utilizing the Swin Transformer, designed to aid physicians in making more precise diagnostic decisions in a time-efficient manner. Additionally, the focus is to reduce the cost of the testing process by expediting the detection process. The Swin Transformer is a state-of-the-art vision transformer that employs a hierarchical feature representation and shifted window mechanism to enhance image understanding. We employ the NIH Chest X-ray dataset, which consists of 1,557 images labeled as not having tuberculosis and 3,498 images depicting the disease. The dataset is randomly split into training, validation, and testing sets using a 64%, 16%, and 20% ratio, respectively. Our methodology involves preprocessing the images using random resized crop, horizontal flip, and normalization before converting them into tensors. The Swin Transformer model is trained for 50 epochs with a batch size of 8, using the Adam optimizer and a learning rate of 1e-5. We monitor the model's accuracy and loss during training and calculate the F1-score, precision, and recall to evaluate its performance. The results of our study reveal a peak training dataset accuracy of 0.88 at the 43rd epoch, while the validation dataset achieves its highest accuracy of 0.88 after 20 epochs. The testing phase yields a precision of 0.7928 and 0.9008, recall of 0.7749 and 0.9099, and F1-score of 0.7837 and 0.905 for the "Negative" and "Positive" classes, respectively. The Swin Transformer exhibits encouraging performance, and we anticipate that this architecture will be easily adaptable and possess considerable potential for enhancing the speed and efficiency of diagnostic decisions made by physicians in the future.

Keywords

Lung tuberculosis, Medical diagnostics, Swin Transformer, Vision transformer, Hierarchical feature representation, Shifted window mechanism, Deep learning, Computer vision, Medical image analysis, NIH Chest X-ray dataset, Early diagnosis

AIRCC

A Hierarchical Vision Approach for Enhanced Medical Diagnostics of Lung Tuberculosis using Swin Transformer

Authors

Abstract

Keywords