×
Mycroft - Retrieval Augmented Generation for SDK Documentation

Authors

Diego Costa, Gabriel Matos, Gilson Russo, Leon Barroso and Erick Bezerra, Brazil

Abstract

Information retrieval plays an important role in everyday tasks, especially when it comes to documentation. Retrieving information about private documentation used to build other software is very challenging due to its absence on the internet, meaning there is no information about it beyond its own documentation. Due to concerns about confidential data, using external roprietary systems is prohibited. Motivated by this, in this study, we present Mycroft, a retrieval system that leverages the Retrieval Augmented Generation technique to find a feasible approach that improves search and information retrieval requested by users about the documentation. To implement this system, a dataset of questions and answers about the documentation was generated for evaluation. The system was developed on-premise using open-source Large Language Models and evaluated using Natural Language Processing metrics and human evaluation to validate the generated answers. Following an extensive evaluation of the results, the proposed retrieval system demonstrated satisfactory performance in addressing user queries and achieved favorable outcomes in human evaluation, indicating its utility.

Keywords

integer labels, time series, optimization, performance, data warehouse, indexing, aggregation