Semantic Document Classification based on Strategies of Semantic Similarity Computation and Correlation Analysis

Shuo Yang; Ran Wei; Hengliang Tan; Jiao Du

doi:10.5121/csit.2019.91301

Volume 9, Number 13, November 2019

Semantic Document Classification based on Strategies of Semantic Similarity
Computation and Correlation Analysis

Authors

Shuo Yang^1*, Ran Wei², Hengliang Tan¹ and Jiao Du¹, ¹Guangzhou University, China and ²University of California, USA

Abstract

Document (text) classification is a common method in e-business, facilitating users in the tasks such as document collection, analysis, categorization and storage. Semantic analysis can help to improve the performance of document classification. Though having been considered when designing previous methods for automatic document classification, more focus should be given to semantics with the increase number of content-rich electronic documents, forum posts or blogs online, which can reduce human workload by a great margin. This paper proposes a novel semantic document classification approach aiming to resolve two types of semantic problems: (1) polysemy problem, by using a novel semantic similarity computing strategy (SSC) and (2) synonym problem, by proposing a novel strong correlation analysis method (SCM). Experiments show that our strategies can help to improve the performance of the baseline methods.

Keywords

semantic document classification, semantic similarity, semantic embedding, correlation analysis, machine learning

Subscription Membership AIRCC CSCP Contact Us
All Rights Reserved ® AIRCC