Cross-Lingual Statistical Parsing with Tree-Adjoining Grammar: A POS Enriched Extension for Robust Natural Language

doi:10.5121/ijnlc.2025.14501

Volume 14, Number 5

Cross-Lingual Statistical Parsing with Tree-Adjoining Grammar: A POS Enriched Extension for Robust Natural Language Processing

Authors

Pavan Kurariya, Prashant Chaudhary, Jahnavi Bodhankar and Lenali Singh, Centre for Development of Advanced Computing, India

Abstract

This paper presents an extended statistical parsing framework for Tree-Adjoining Grammar (TAG) that incorporates part-of-speech (POS) information to enhance syntactic disambiguation, improve accuracy, and increase cross-lingual adaptability. While TAG provides a linguistically expressive mechanism for representing complex syntactic phenomena such as recursion and long-distance dependencies, however, conventional statistical TAG parsers remain largely constrained by their reliance on lexical anchors, which limits generalization across languages and leads to inefficiencies in ambiguous contexts. To address this, we improvise the statistical TAG formalism by conditioning derivation decisions on both lexical items and their associated POS tags, thereby enriching the feature space with syntactic category information. Beyond the baseline framework, this extended version introduces three major contributions. First, it integrates POS-based features into both generative and discriminative models, enabling robust handling of unseen or low-frequency lexical items. Second, it presents a cross-lingual evaluation using multilingual treebanks covering English to Indian language pairs, demonstrating consistent improvements in parsing accuracy and a 40–45% reduction in parsing time compared to conventional lexicalized TAG parser. Third, it provides an expanded analysis of computational efficiency, error patterns, and scalability across varying sentence lengths and linguistic families. Experimental results on a dataset of 15,000 annotated sentences reveal that the latest parser achieves significant gains in both accuracy and efficiency, with stable performance even in low-resource scenarios. The framework’s design further allows integration with neural embeddings, opening pathways toward hybrid symbolic–neural parsing models. Overall, the proposed POS enriched cross-lingual TAG framework offers a scalable, linguistically grounded, and computationally efficient solution for modern Natural Language Processing (NLP) tasks, including machine translation, information extraction, and question answering.

Keywords

Natural Language Processing (NLP), Tree Adjoining Grammar (TAG), Part-Of-Speech (POS), CrossLingual Parsing (CLP)