DigiTala's YKI data
Description
This resource is available via Kielipankki – The Language Bank of Finland.
DigiTala-YKI contains two subsets of data:
1. Speech samples from participants of the general language tests (Yleiset kielitutkinnot, YKI) for Finnish as a second language (207 speakers), transcripts of the speech samples, background information about the speakers, a dataset of human ratings of the speech samples, the raters' background information and their responses to post-rating surveys.
The speech samples are responses to narrative tasks. The exact task descriptions from the YKI tests are not available. The rating data was collected during the DigiTala research project (2019–2023) in 2021. The tasks, the surveys and the rating criteria are available via https://zenodo.org/communities/digitala/.
2. Speech samples from participants of the general language tests (Yleiset kielitutkinnot, YKI) for Swedish as a second language (24 speakers), transcripts of the speech samples, and background information about the speakers. Ratings were not performed for this subset.
The resource also contains the consent forms and the information provided to the research participants.
Information about the size of the subsets:
digitala_yki_audios_fi: 404 recordings, mean duration 105 sec, total duration 9.66 h, 207 unique speakers + 356 transcriptions + 4 related text/table files
digitala_yki_audios_swe: 24 recordings, mean duration 91 sec, total duration 0.60 h, 24 unique speaker + 7 transcriptions + 1 related text/table file
Authors of the resource:
Heini Kallio, Sari Ohranen, Tuija Hirvelä, Ari Huhta, Anna von Zansen, Yaroslav Getman, Ekaterina Voskoboinik, Ragheb Al-Ghezi, Milla Sneck, Mikko Kuronen, Mikko Kurimo, Raili Hildén
Show moreYear of publication
2023
Authors
University of Helsinki
Heini Kallio - Curator
Other information
Fields of science
Languages
Language
Finnish, Swedish
Open access
Restricted access