Clotho-AQA dataset

Description

Clotho-AQA is an audio question-answering dataset consisting of 1991 audio samples taken from Clotho dataset [1]. Each audio sample has 6 associated questions collected through crowdsourcing. For each question, the answers are provided by three different annotators making a total of 35,838 question-answer pairs. For each audio sample, 4 questions are designed to be answered with 'yes' or 'no', while the remaining two questions are designed to be answered in a single word. More details about the data collection process and data splitting process can be found in our following paper. S. Lipping, P. Sudarsanam, K. Drossos, T. Virtanen ‘Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering.’ The paper is available online at 2204.09634.pdf (arxiv.org) If you use the Clotho-AQA dataset, please cite the paper mentioned above. A sample baseline model to use the Clotho-AQA dataset can be found at partha2409/AquaNet (github.com) To use the dataset, • Download and extract ‘audio_files.zip’. This contains all the 1991 audio samples in the dataset. • Download ‘clotho_aqa_train.csv’, ‘clotho_aqa_val.csv’, and ‘clotho_aqa_test.csv’. These files contain the train, validation, and test splits, respectively. They contain the audio file name, questions, answers, and confidence scores provided by the annotators. License: The audio files in the archive ‘audio_files.zip’ are under the corresponding licenses (mostly CreativeCommons with attribution) of Freesound [2] platform, mentioned explicitly in the CSV file ’clotho_aqa_metadata.csv’ for each of the audio files. That is, each audio file in the archive is listed in the CSV file with meta-data. The meta-data for each file are: • File name • Keywords • URL for the original audio file • Start and ending samples for the excerpt that is used in the Clotho dataset • Uploader/user in the Freesound platform (manufacturer) • Link to the license of the file. The questions and answers in the files: • clotho_aqa_train.csv • clotho_aqa_val.csv • clotho_aqa_test.csv are under the MIT license, described in the LICENSE file. References: [1] K. Drossos, S. Lipping and T. Virtanen, "Clotho: An Audio Captioning Dataset," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 736- 740, doi: 10.1109/ICASSP40776.2020.9052990. [2] Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245
Show more

Year of publication

2022

Type of data

Authors

Konstantinos Drossos - Creator

Parthasaarathy Ariyakulam Sudarsanam - Creator

Samuel Lipping - Creator

Tuomas Virtanen - Creator

Zenodo - Publisher

Project

Other information

Fields of science

Computer and information sciences

Language

English

Open access

Open

License

License Not Specified

Keywords

Computer and information sciences

Subject headings

Temporal coverage

undefined

Related to this research data