The Suomi24 Corpus 2001-2017, VRT version 1.3
Description
The corpus is available in Kielipankki - the Language Bank of Finland, download.
The corpus contains all the texts available in the Suomi24 API from the discussion forums of the Suomi24 online social networking website from 1.1.2001 to 31.12.2017. The tokenized version was created and the annotation process was then carried out by Jussi Piitulainen.
Updates:
2025-04-11: For version 1.3 the data has been updated with annotations of names recognized with FiNER 1.6 and languages of sentences identified with HeLI-OTS 2.0.
The entire corpus in the VRT format is downloadable for academic research purposes.
Show moreYear of publication
2020
Type of data
Authors
Aller Media Oy - Creator
University of Helsinki - Curator
Project
Other information
Fields of science
Languages
Language
Finnish
Open access
Restricted access