Jairo R. Junior and Leandro A Silva, Presbyterian Mackenzie University, Brazil
Significant advancements have been achieved in natural language processing models for text classification with the emergence of pre-trained transformers and deep learning. Despite promising results, deploying these models in production environments still faces challenges. Classification models are continuously evolving, adapting to new data and predictions. However, changes in data distribution over time can lead to a decline in performance, indicating that the model is outdated. This article aims to analyze the lifecycle of a natural language processing model by employing multivariate statistical methods capable of detecting model drift over time. These methods can be integrated into the training and workflow management of machine learning models. Preliminary results show that the statistical method Maximum Mean Discrepancy performs better in detecting drift in models trained with data from multiple domains through high-dimensional vector spaces after being subjected to an untrained auto-encoder. The classifier model achieved an accuracy rate of 93% in predicting intentions, using accuracy as the evaluation metric.
Intent recognition, Drift detection, Data drift, MLOps (Machine Learning Operations)