The Suomi24 Corpus 2018-2020, VRT version 1.1

Description

The corpus is available for download in Kielipankki - the Language Bank of Finland. The corpus contains all the texts available in the Suomi24 API from the discussion forums of the Suomi24 online social networking website from 1.1.2018 to 31.12.2020. The tokenized version was created and the annotation process carried out by Jussi Piitulainen. Updates: 2025-04-14: For version 1.1 the data has been updated with annotations of names recognized with FiNER 1.6 and languages of sentences identified with HeLI-OTS 2.0. The entire corpus in VRT format may be downloaded for academic research purposes.
Show more

Year of publication

2021

Type of data

Authors

City Digital Group - Creator

User support FIN-CLARIN - Curator

Project

Other information

Fields of science

Languages

Language

Finnish

Open access

Restricted access

License

CLARIN ACA+NC (Academic, Non Commercial) End User License 1.0

Keywords

Subject headings

Temporal coverage

undefined

Related to this research data