Volume 14, Number 3

Large Language Models for Ciphers

  Authors

David Noever, PeopleTec, USA

  Abstract

This study investigates whether transformer models like ChatGPT (GPT4, MAR2023) can generalize beyond their training data by examining their performance on the novel Cipher Dataset, which scrambles token order. The dataset consists of 654 test cases, and the analysis focuses on 51 text examples and 13 algorithmic choices. Results show that the models perform well on low-difficulty ciphers like Caesar and can unscramble tokens in 77% of the cipher examples. Despite their reliance on training data, the model's ability to generalize outside of token order is surprising, especially when leveraging large-scale models with hundreds of billions of weights and a comprehensive text corpus with few examples. The original contributions of the work focus on presenting a cipher challenge dataset and then scoring historically significant ciphers for large language models to descramble. The real challenge for these generational models lies in executing the complex algorithmic steps on new cipher inputs, potentially as a novel reasoning challenge that relies less on knowledge acquisition and more on trial-and-error or out-ofbounds responses.

  Keywords

Transformers, Text Generation, Cipher, Generative Pre-trained Transformers, GPT