Aanchal Varma and Chetan Bhat, Freshworks, India
Recent emergence of large language models (LLMs), particularly GPT variants has created a lot of buzz due to their state-of-the-art performance results. However, for highly domain-specific datasets such as sales and support conversations, most LLMs do not exhibit high performance out-of-the-box. Thus, finetuning is needed which many budget-constrained businesses cannot afford. Also, these models have very slow inference times making them unsuitable for many real-time applications. Lack of interpretability and access to probabilistic inferences is another problem. For such reasons, BERT-based models are preferred. In this paper, we present SAS-BERT, a BERT-based architecture for sales and support conversations. Through novel pre-training enhancements and GPT-3.5 led data augmentation, we demonstrate improvement in BERT performance for highly domain-specific datasets which is comparable with finetuned LLMs. Our architecture has 98.5% fewer parameters compared to the largest LLM considered, trains under 72 hours, and can be hosted on a single large CPU for inference.
BERT, LLM, Text Classification, Domain pre-training, NLP applications.