×
D-RAG: A Privacy-Preserving Framework for Decentralized RAG using Blockchain

Authors

Tessa E Andersen1, Ayanna Marie Avalos2, Gaby G. Dagher3 and Min Long3, 1Boise State University, USA, 2Brigham Young University, USA, 3California State University, USA

Abstract

Retrieval Augmented Generation (RAG) has been a recent improvement in providing recent and accurate data to Large Language Models (LLMs). Although RAG has been successful in reducing hal-lucinations within LLMs, it remains susceptible to inaccurate and maliciously manipulated data. In this paper, we present Distributed-RAG (D-RAG), a novel blockchain-based framework designed to increase the integrity of the RAG system. D-RAG addresses the risks of malicious data by replacing the RAG’s traditionally centralized database with communities, each consisting of a database and a permissioned blockchain. The communities are based on different subjects, each containing experts in the field who verify data through a privacy-preserving consensus protocol before it is added to the database. A Retrieval Blockchain is also designed to communicate between the multiple communities. The miners on this Re-trieval Blockchain are responsible for retrieving documents from the database for each query and ranking them using an LLM. These rankings are agreed upon, and the top ranked documents are provided to the LLM with the query to generate a response. We perform experiments on our proposed D-RAG framework, and our results show that our Retrieval Blockchain is scalable and our privacy-preserving consensus pro-tocol maintains efficiency as community members increase. These results demonstrate that in a real-world application setting D-RAG is scalable in maintaining data integrity.

Keywords

Blockchain, RAG, LLM, Privacy-Preserving.