Multimodal Cascaded Approach for Hierarchical Logo Tagging in Packaging Artwork Files

doi:10.5121/csit.2025.151808

Multimodal Cascaded Approach for Hierarchical Logo Tagging in Packaging Artwork Files

Authors

Shishir Maurya, Anshul Verma, Yugal Gopal Sharma and Dhanush Dharmaretnam, SGS&CO, USA

Abstract

This study proposes a novel method for recognizing and categorizing logos in packaging artwork to address the automation demands of the printing and packaging industry. The approach combines a trained object detection model for logo detection followed by a fine-tuned Vision Language Model (VLM) for hierarchical tag generation, achieving high precision across seven primary categories: sustainability, health and safety, branding, material identification, eco-friendly certification, social media, and compliance, with all others grouped under "others." In the first step, YOLOv8 detects logos and assigns them to primary categories, achieving a mean average precision (mAP) of 0.58 and an Intersection over Union (IoU) threshold of 0.5. In the second step, a fine-tuned VLM generates granular tags for the detected logos. Notably, Low Rank Adaptations (LoRA) applied to the Florence-2-DocVQA model (with r = 64 and [Equation] = 128) surpassed the zero-shot performance of state-of-the-art VLMs, achieving a 24-fold improvement with a ROUGE-L F1 score of 0.72. This study also demonstrates the cost effectiveness and practicality of using smaller models with fewer parameters, which perform comparably to larger VLMs, incurring much lower training and operational costs. These advancements streamline design and print production workflows, improve compliance tracking, and enhance brand management, contributing to greater automation in the packaging and printing industry.

Keywords

Packaging Artwork, VLMs, Artwork Tagging, Low Rank Adaptation (LoRA)

AIRCC

Multimodal Cascaded Approach for Hierarchical Logo Tagging in Packaging Artwork Files

Authors

Abstract

Keywords