SciRAG-QA: Multi-domain Closed-Question Benchmark Dataset for Scientific QA

Description

In recent times, one of the most impactful applications of the growing capabilities of Large Language Models (LLMs) has been their use in Retrieval-Augmented Generation (RAG) systems. RAG applications are inherently more robust against LLM hallucinations and provide source traceability, which holds critical importance in the scientific reading and writing process. However, validating such systems is essential due to the stringent systematic requirements of the scientific domain. Existing benchmark datasets are limited in the scope of research areas they cover, often focusing on the natural sciences, which restricts their applicability and validation across other scientific fields. To address this gap, we present a closed-question answering (QA) dataset for benchmarking scientific RAG applications. This dataset spans 34 research topics across 10 distinct areas of study. It includes 108 manually curated question-answer pairs, each annotated with answer type, difficulty level, and a gold reference along with a link to the source paper. Further details on each of these attributes can be found in the accompanying README.md file. Please cite the following publication when using the dataset: TBD The publication is available at: TBD A preprint version of the publication is available at: TBD

Year of publication

2024

Authors

Zenodo - Publisher

Åbo Akademi University

Mahira Ibnath Joytu - Creator

Md Raisul Kibria - Creator

Sébastien Lafond - Creator

Other information

Fields of science

Computer and information sciences

Open access

Open

License

Creative Commons Attribution 4.0 International (CC BY 4.0)