Academy & Industry Research Collaboration Center (AIRCC)

Volume 10, Number 04, April 2020

Managing the Syntactic Blindness of Latent Semantic Analysis

  Authors

Raja Muhammad Suleman and Ioannis Korkontzelos, Edge Hill University, United Kingdom

  Abstract

Natural Language Processing is a sub-field of Artificial Intelligence that is used for analysing and representing human language automatically. Natural Language Processing has been employed in many applications, such as information retrieval, information processing, automated answer grading etc. Several approaches have been developed for understanding the meaning of text, commonly known as semantic analysis. Latent Semantic Analysis is a widely used corpus-based approach that evaluates similarity of text on the basis of semantic relations among words. Latent Semantic Analysis has been used successfully in different language systems for calculating the semantic similarity of texts. However, Latent Semantic Analysis ignores the structural composition of sentences and therefore this technique suffers from the syntactic blindness problem. Latent Semantic Analysis fails to distinguish between sentences that contain semantically similar words but have completely opposite meaning. Latent Semantic Analysis is also blind to the syntactic structure of a sentence and therefore cannot differentiate between sentences and lists of keywords. In such a situation, the comparison between a sentence and a list of keywords without any syntactic structure gets a high similarity score. In this research we propose an algorithmic extension to Latent Semantic Analysis which focuses on syntactic composition of a sentence to overcome Latent Semantic Analysis’s syntactic blindness problems. We tested our approach on sentence pairs containing similar words but having different meaning. Our results showed that our extension provides more realistic semantic similarity scores

  Keywords

Natural Language Processing, Natural Language Understanding, Latent Semantic Analysis, Semantic Similarity.