RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts

doi:10.5121/ijnlc.2023.12303

Volume 12, Number 3

RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts

Authors

V. Indumathi, S. SanthanaMegala, Rathnavel Subramaniam College of Arts and Science, India

Abstract

Cyberbullying is currently one of the most important research fields. The majority of researchers have contributed to research on bully text identification in English texts or comments, due to the scarcity of data; analyzing Tamil textstemming is frequently a tedious job. Tamil is a morphologically diverse and agglutinative language. The creation of a Tamil stemmer is not an easy undertaking. After examining the major difficulties encountered, proposed the rule-based iterative preprocessing algorithm (RBIPA). In this attempt, Tamil morphemes and lemmas were extracted using the suffix stripping technique and a supervised machine learning algorithm for classify the word based for pronouns and proper nouns. The novelty of proposed system is developing a preprocessing algorithm for iterative stemming; lemmatize process to discovering exact words from the Tamil Language comments. RBIPA shows 84.96% of accuracy in the given Test Dataset which hasa total of 13000 words.

Keywords

Rule-Based Preprocessing, Cyberbullying, NLP, Tamil Stemmer, Lemmatization, Machine Learning