Abdelouahab Hocini and Kamel Sma¨ıli, University of Lorraine, France
This work explores the use of Large Language Models (LLMs) for fake news detection in multilingual and multi-script contexts, focusing on Arabic dialects. We address the challenge of insufficient digital data for many Arabic dialects by using pretrained LLMs on a diverse corpus including Modern Standard Arabic (MSA), followed by fine-tuning on dialect-specific data. We examine AraBERT, DarijaBERT, and mBERT for performance on North African Arabic dialects, incorporating code-switching and writing styles such as Arabizi. We evaluate these models on the BOUTEF dataset, which includes fake news, fake comments, and denial categories. Our approach fine-tunes both Arabic and Latin script text, with a focus on cross-script generalization. We improve accuracy using an ensemble strategy that merges predictions from AraBERT and DarijaBERT. Additionally, we introduce a new custom loss function, named CALLM to enforce consistency between models, boosting classification performance. The use of CALLM achieves significant improvement in F1-score (12.88 ↑) and accuracy (2.47 ↑) compared to the best model (MarBERT).
Tacit knowledge,Implicit knowledge formalization, Cognitive maps, Knowledge management, NLP (Natural Language Processing), Topic modeling, Latent variables, Corporate knowledge systems