Split-Brain Rag: Why Large Language Models are not Enough for Scientific Question Answering

Authors

Jodi Moselle Alcantara and Armielyn Obinguar , Independent Researcher, Philippines

Abstract

This work evaluates monolithic LLM deployments in Ricerca Paperchat, a Retrieval-Augmented Generation (RAG) system for academic inquiry. Current approaches typically rely on a “monolithic” architecture in which a single LLM handles all tasks. This study posits that monolithic architectures are inherently inefficient for scientific QA because tasks impose divergent computational requirements. Simple. Simple retrieval prioritizes speed, whereas complex synthesis demands reasoning. Furthermore, relying on a single generative model introduces risks of “hallucination inheritance,” where models trained on synthetic data replicate the errors of their generators, leading to a degradation of knowledge integrity over time. Authors Authors tested model compliance against five rigid output constraints: formatting citations, generating Mark down tables, outputting raw JSON, adhering to bullet point styles, and correctly formatting code headers.

Keywords

Large Language Models, Retrieval-Augmented Generation, Scientific Question Answering, Complexity-Aware Routing, LLM Performance Evaluation

AIRCC

Split-Brain Rag: Why Large Language Models are not Enough for Scientific Question Answering

Authors

Abstract

Keywords