Volume 15, Number 4

Comparing LLMs using a Unified Performance Ranking System

  Authors

Maikel Leon, University of Miami, USA

  Abstract

Large Language Models (LLMs) have transformed natural language processing and AI-driven applications. These advances include OpenAI’s GPT, Meta’s LLaMA, and Google’s PaLM. These advances have happened quickly. Finding a common metric to compare these models presents a substantial barrier for researchers and practitioners, notwithstanding their transformative power. This research proposes a novel performance ranking metric to satisfy the pressing demand for a complete evaluation system. Our statistic comprehensively compares LLM capacities by combining qualitative and quantitative evaluations. We examine the advantages and disadvantages of top LLMs by thorough benchmarking, providing insightful information on how they compare performance. This project aims to progress the development of more reliable and effective language models and make it easier to make well-informed decisions when choosing models.

  Keywords

Large Language Models (LLMs), Performance Evaluation, Benchmarking, Qualitative Analysis, and Quantitative Metrics.