Takanori Asano and Yoshiaki Yasumura, Shibaura Institute of Technology, Japan
In this paper, For the task of the depth map of a scene given a single RGB image. We present an estimation method using a deep learning model that incorporates size perspective (size constancy cues). By utilizing a size perspective, the proposed method aims to address the difficulty of depth estimation tasks which stems from the limited correlation between the information inherent to objects in RGB images (such as shape and color) and their corresponding depths. The proposed method consists of two deep learning models, a size perspective model and a depth estimation model, The size-perspective model plays a role like that of the size perspective and estimates approximate depths for each object in the image based on the size of the object's bounding box and its actual size. Based on these rough depth estimation (pre-depth estimation) results, A depth image representing the rough depths of each object (pre-depth image) is generated and this image is input with the RGB image into the depth estimation model. The pre-depth image is used as a hint for depth estimation and improves the performance of the depth estimation model. With the proposed method, it becomes possible to obtain depth inputs for the depth estimation model without using any devices other than a monocular camera beforehand. The proposed method contributes to the improvement in accuracy when there are objects present in the image that can be detected by the object detection model. In the experiments using an original indoor scene dataset, the proposed method demonstrated improvement in accuracy compared to the method without pre-depth images.
Depth Estimation, Deep Learning, Image Processing, Size Perspective, YOLOv8