Volume 16, Number 5

Baoulé Related Parallel Corpora for Machine Translation Tasks: Mtbci-1.0

  Authors

Kouassi Konan Jean-Claude, Bircham International University, Spain

  Abstract

According to the Ethnologue platform, we have 7,164 known living languages in the World, and not all of them have data available over the internet to facilitate Artificial Intelligence (AI) tasks such as Machine Translation (MT). Consequently, there is a need for thorough Data Engineering tasks for most of these languages. Especially, the Baoulé living language normalized as ISO 639-3 (bci) is not yet supported on popular worldwide free translation platform such as Microsoft Translator, nor on the Official Wikipedias. In this paper, we have proposed the "Baoulé Related Parallel Corpora for Machine Translation tasks: mtBCI-1.0" to make parallel Baoulé-related datasets available to the scientific community for AI tasks implying Machine Translation. We have shown that, after a brief presentation of the Baoulé language in the proposed approach, we will focus on the Data Engineering Process itself before providing a baseline proving that the collected data is of scientific interest.

  Keywords

Artificial Intelligence, Machine Learning, Machine Translation, Data Engineering, Dataset, Parallel Corpora, Baoulé language (bci)