CDFG: Enhancing Chain-Of-Thought Distillation with Feedback

Authors

Lingzhi Gao ¹ , Xuan Wang ² , Tianrun Cai ³ , Xiunao Lin ⁴ and Chao Wu ¹ , ¹ Zhejiang University, China, ² China Media Group, China, ³ University of Manchester, UK, ⁴ Zhejiang Post & Telecommunication Construction Co.Ltd, China

Abstract

Chain-of-thought (CoT) prompting has shown great potential in enhancing the reasoning capabilities of large language models (LLMs), and recent studies have explored distilling this ability into smaller models. However, existing CoT distillation methods often overlook student model errors as valuable learning signals. In this paper, we propose CDFG, a two-stage distillation framework that treats model errors as opportunities for improvement. After an initial imitation-based training phase, the teacher model analyzes the student’s incorrect outputs and generates natural language feedback that highlights reasoning flaws and suggests correction strategies. The student model is then retrained using this guided input. Experiments on several mathematical reasoning benchmarks demonstrate that CDFG consistently improves student model performance. Our results show that incorporating feedback-driven learning into CoT distillation can enhance reasoning accuracy.

Keywords

Chain-of-thought distillation, Large language model, Reasoning

AIRCC

CDFG: Enhancing Chain-Of-Thought Distillation with Feedback

Authors

Abstract

Keywords