×
Enhancing Machine Translation for Low-Resource Languages: A Cross-Lingual Learning Approach for TWI

Authors

Emmanuel Agyei, Zhang Xiaoling, Ama Bonuah Quaye, Odeh Victor Adeyi and Joseph Roger Arhin, University of Electronic Science and Technology, China

Abstract

Machine Translation (MT) for low-resource languages like Twi remains a significant challenge in Natural Language Processing (NLP) due to limited parallel datasets. Traditional methods often struggle, relying heavily on high-resource data, and fail to adequately serve low-resource languages. To address this gap, we propose a fine-tuned T5 model trained with a Cross-Lingual Optimization Framework (CLOF), which dynamically adjusts gradient weights to balance low-resource (Twi) and high-resource (English) datasets. This framework incorporates federated training to enhance translation performance and scalability for other low-resource languages. The study utilizes a carefully aligned and tokenized English-Twi parallel corpus to maximize model input. Translation quality is evaluated using SPBLEU, ROUGE (ROUGE-1, ROUGE-2, and ROUGE-L), and Word Error Rate (WER) metrics. The pretrained mT5 model serves as a baseline, demonstrating the efficacy of the optimized model. Experimental results show significant improvements: SPBLEU increases from 2.16% to 71.30%, ROUGE-1 rises from 15.23% to 65.24%, and WER decreases from 183.16% to 68.32%. These findings highlight CLOF's potential in improving low-resource MT and advancing NLP for underrepresented languages, paving the way for more inclusive, scalable translation systems.

Keywords

Low-Resource Machine Translation; Twi Language; Federated Learning; Cross-Lingual Learning Fine-Tuning