A Study into Math Document Classification using Deep Learning

Fatimah Alshamari; Abdou Youssef; Fatimah Alshamari; Abdou Youssef

doi:10.5121/csit.2020.101702

Volume 10, Number 17, December 2020

A Study into Math Document Classification using Deep Learning

Authors

Fatimah Alshamari^{1, 2} and Abdou Youssef¹, ¹The George Washington University, USA, ²Taibah University, KSA

Abstract

Document classification is a fundamental task for many applications, including document annotation, document understanding, and knowledge discovery. This is especially true in STEM fields where the growth rate of scientific publications is exponential, and where the need for document processing and understanding is essential to technological advancement. Classifying a new publication into a specific domain based on the content of the document is an expensive process in terms of cost and time. Therefore, there is a high demand for a reliable document classification system. In this paper, we focus on classification of mathematics documents, which consist of English text and mathematics formulas and symbols. The paper addresses two key questions. The first question is whether math-document classification performance is impacted by math expressions and symbols, either alone or in conjunction with the text contents of documents. Our investigations show that Text-Only embedding produces better classification results. The second question we address is the optimization of a deep learning (DL) model, the LSTM combined with one dimension CNN, for math document classification. We examine the model with several input representations, key design parameters and decision choices, and choices of the best input representation for math documents classification.

Keywords

Math, document, classification, deep learning, LSTM.

Subscription Membership AIRCC CSCP Contact Us
All Rights Reserved ® AIRCC