Academy & Industry Research Collaboration Center (AIRCC)

Volume 12, Number 14, August 2022

Author Identification using Traditional Machine Learning Models

  Authors

Ojaswi Binnani, International Institute of Information Technology-Hyderabad, India

  Abstract

The Internet has many useful resources with bountiful information at our fingertips. However, there are nefarious uses to this resource, and can be misused in cybercrime, fake emails, stealing content, plagiarism etc. In many cases, the text is anonymously written, and it is important to accurately find the author to bring the criminal to justice. The topic of author identification helps with this task, where from a set of suspect authors, the writer of a given text will be determined. We aim to create a computationally non-complex model that works to find the author of a given text. The model will not require as much data as deep learning methods. This paper focuses on the use of various stylometric and word-based features as well as different machine learning models to create a classifier that gives the best accuracy. We find that the XGBoosting algorithm performs this task with a good accuracy.

  Keywords

Author Identification, Forensic Linguistics, Machine Learning.