×
Unveiling the Power of TAG Using Statistical Parsing for Natural Languages

Authors

Pavan Kurariya, Prashant Chaudhary, Jahnavi Bodhankar, Lenali Singh and Ajai Kumar, Centre for Development of Advanced Computing, India

Abstract

The Revolution of the Artificial Intelligence (AI) has started when machines could decipher enigmatic symbols concealed within messages. Subsequently, with the progress of Natural Language Processing (NLP), machines attained the capacity to understand and comprehend human language. Tree Adjoining Grammar (TAG) has become powerful grammatical formalism for processing Large-scale Grammar. However, TAG mostly rely on Grammar which is created by Languages expert and due to structural ambiguity in Natural Languages computation complexity of TAG is very high o(n^6). We observed that rules-based approach has many serious flaws, firstly, language evolves with time and it is impossible to create grammar which is extensive enough to represent every structure of language in real world. Secondly, it takes too much time and language resources to develop a practical solution. These difficulties motivated us to explore an alternative approach instead of completely rely on the rule-based method. In this paper, we proposed a Statistical Parsing algorithm for Natural Languages (NL) using TAG formalism where Parser makes crucial use of data driven model for identifying Syntactic dependencies of complex structure. We observed that using probabilistic model along with limited training data can significantly improve both the quality and performance of TAG Parser. We also demonstrate that the newer parser outperforms previous rule-based parser on given sample corpus. Our experiment for many Indian Languages, also provides further support for the claim that above mentioned approach might be an awaiting solution for problem that require rich structural analysis of corpus and constructing syntactic dependencies of any Natural Language without much depending on manual process of creating grammar for same. Finally, we present result of our on-going research where probability model will be applying to appropriate selection of adjunction of any given node of elementary trees and state chart representations are shared across derivation.

Keywords

Artificial Intelligent (AI), Natural Language Processing (NLP), Tree Adjoining Grammar (TAG), Natural Languages (NL)