Decoding the Encoded - Linguistic Secrets of Language Models: A Systematic Literature Review

doi:10.5121/csit.2023.131606

Decoding the Encoded - Linguistic Secrets of Language Models: A Systematic Literature Review

Authors

Hayastan Avetisyan and David Broneske, The German Centre for Higher Education Research and Science Studies (DZHW), Germany

Abstract

Language models' growing role in natural language processing necessitates a deeper understanding of their linguistic knowledge. Linguistic probing tasks have become crucial for model explainability, designed to evaluate models' understanding of various linguistic phenomena. Objective: This systematic review critically assesses the linguistic knowledge of language models via linguistic probing, providing a comprehensive overview of the understood linguistic phenomena and identifying future research areas. Method: We performed an extensive search of relevant academic databases and analyzed 57 articles published between October 2018 and October 2022. Results: While language models exhibit extensive linguistic knowledge, limitations persist in their comprehension of specific phenomena. The review also points to a need for consensus on evaluating language models' linguistic knowledge and the linguistic terminology used. Conclusion: Our review offers an extensive look into linguistic knowledge of language models through linguistic probing tasks. This study underscores the importance of understanding these models' linguistic capabilities for effective use in NLP applications and for fostering more explainable AI systems.

Keywords

LLMs, linguistic knowledge, probing, analysis of LMs.

AIRCC

Decoding the Encoded - Linguistic Secrets of Language Models: A Systematic Literature Review

Authors

Abstract

Keywords