Enhancing YOLOv8 for Infrared Object Detection via Learnable Multi-Scale Context and Attention

Volume 17, Number 1/2/3

Enhancing YOLOv8 for Infrared Object Detection via Learnable Multi-Scale Context and Attention

Authors

Anthony Amankwah, Amankwah Consult, Germany

Abstract

Infrared (IR) object detection poses unique challenges due to low texture, weak edges, sensor noise, and a strong dependence on global context. Modern convolutional object detectors, including YOLOv8, are primarily optimized for RGB imagery and often underperform when directly applied to large-scale infrared datasets. In this paper, we present a simple yet effective architectural enhancement to YOLOv8 by (1) replacing the standard Spatial Pyramid Pooling Fast (SPPF) module with a learnable multi-scale variant (SPPFPlus), and (2) integrating the Convolutional Block Attention Module (CBAM) into both the backbone and neck. The proposed approach introduces explicit parallel multi-scale context modeling and adaptive channel–spatial attention, addressing key representational limitations of infrared imagery. Extensive experiments on a large infrared dataset demonstrate an improvement of approximately 15% in detection performance over the YOLOv8 baseline, with notable gains in recall and small-object detection. The modifications are lightweight, modular, and can be seamlessly integrated into existing YOLOv8

Keywords

YOLOv8, Attention, SPPFPlus