Emmanuel Agyei, Zhang Xiaoling, Ama Bonuah Quaye, Odeh Victor Adeyi and Joseph Roger Arhin, University of Electronic Science and Technology, China
Machine Translation (MT) for low-resource languages like Twi remains a significant challenge in Natural Language Processing (NLP) due to limited parallel datasets. Traditional methods often struggle, relying heavily on high-resource data, and fail to adequately serve low-resource languages. To address this gap, we propose a fine-tuned T5 model trained with a Cross-Lingual Optimization Framework (CLOF), which dynamically adjusts gradient weights to balance low-resource (Twi) and high-resource (English) datasets. This framework incorporates federated training to enhance translation performance and scalability for other low-resource languages. The study utilizes a carefully aligned and tokenized English-Twi parallel corpus to maximize model input. Translation quality is evaluated using SPBLEU, ROUGE (ROUGE-1, ROUGE-2, and ROUGE-L), and Word Error Rate (WER) metrics. The pretrained mT5 model serves as a baseline, demonstrating the efficacy of the optimized model. Experimental results show significant improvements: SPBLEU increases from 2.16% to 71.30%, ROUGE-1 rises from 15.23% to 65.24%, and WER decreases from 183.16% to 68.32%. These findings highlight CLOF's potential in improving low-resource MT and advancing NLP for underrepresented languages, paving the way for more inclusive, scalable translation systems.
Low-Resource Machine Translation; Twi Language; Federated Learning; Cross-Lingual Learning Fine-Tuning