Youssef Alothman 1 and Mohamed Bader-El-Den 2 , 1 University of Portsmouth, Portsmouth, UK, 2 Abdullah Al Salem University, Kuwait
The semiconductor manufacturing sector produces enormous amounts of textual data that is highly imbalanced, non-stationary, and operationally critical. Although transformer-based language models achieve strong classification accuracy, their robustness and probability calibration under industrial constraints remain insufficiently addressed, particularly in resource-limited deployments. This paper proposes LiteFormer, a lightweight and calibrated transformer framework for imbalanced industrial text classification. The technique combines a geometry-informed minority over-sampling technique with D-SMOTE, imbalance-informed optimization with Focal Loss, and a post-hoc temperature scaling method. The technique outperforms standard transformer models on a large-scale industrial Root Cause Analysis data set, obtaining higher macro-F1 and significantly better Expected Calibration Error, while remaining computationally efficient. The technique performs robustly even when faced with temporal and domain shifts.
Imbalanced text classification, lightweight transformers, probability calibration, focal loss, industrial NLP