Parallel Bible Verses for Uralic Studies version 2, Korp

Description

This resource will be available via Korp in Kielipankki – the Language Bank of Finland These parallel corpora consist of Biblical verses (historical and contemporate, 1821–2023) from Erzya (myv), Moksha (mdf); Olonets-Karelian (Livvi) (olo), Dvina-Karelian (North Karelian Proper) (krl), Livonian (liv), Veps (vep); Khanty (kca), Mansi (mns); Komi-Permyak (koi), Komi-Zyrian (kpv), Udmurt (udm); Meadow & Eastern Mari (mhr) and Hill Mari (mrj). The majority of the texts, in reference to newer translations, come from the Institute for Bible Translation in Helsinki, Finland as originally organized for the University of Helsinki Language Corpus Server (UHLCS). Finnish, Estonian, Hungarian as well as Russian and Ukrainian translations are also included. The purpose of these parallel corpora is to further the studies of translation in Uralic minority languages. Simultaneously, it provides an opportunity to follow changes in lexical and syntactic strategies used in different versions of Biblical verses in one language or compare lexicon and structure between languages. Lemmatization and morphological analyses are provided for all but Khanty, Estonian, Hungarian, Russian and Ukrainian, and the accuracy in the remaining languages should be developed as disambiguation resources. The minority languages have been lemmatized and annotated for both morphological features and syntactic dependencies with the use of open hfst-analysers developed in the GiellaLT (Clarino) infrastructure. The Finnish texts have been analyzed with TNPP (Turku Neural Parser Pipeline), which includes lemmatization, morphological analysis as well as syntactic annotation. The choice of including two closely related Slavic languages is founded on the idea that historically Slavic contact has been representative of Kiev, Novgorod, St Petersburg and Moscovian idioms spoken colloquially. The 27 books of the New Testament are included for the following 15 languages: est (2022), fin (1932–1938), hun (2021), koi (2019), kpv (2008), krl (2011), mdf (2016), mhr (2007), mrj (2014), myv (1821–1827, 2006), olo (2003), rus (1876), udm (1997, 2013), ukr (2022), vep (2006). The 39 books of the Old Testament are included for two languages: fin (1932–1938) and udm (2013). Additionally, the following books are included: kca (2013–2018): MRK, GEN, JON; koi (1996): MRK; kpv (1995–1997): MRK, JHN; krl (2020–2023): JON; liv (1942): MAT; mdf (1901): JHN; mdf (1995): MRK; mdf (2020–2022): GEN, EXO; mhr (1994–1995): MRK, LUK, JHN; mns (2000–2016): MAT, MRK, LUK, JHN, JON; myv (1910): MAT, MRK, LUK, JHN; myv (1995–1998): MAT, MRK, LUK, ACT; myv (2011–2020): RUT, PSA, ECC, SNG, JON; olo (1993–1997): MAT, MRK, LUK, JHN; olo (2006–2020): GEN, RUT, PSA, PRO, ISA, JON; udm (2016): TOB, JDT, WIS, SIR, BAR, LJE, 1MA, 2MA, 3MA, 1ES, 2ES; vep (1992–1998): MAT, MRK, JHN; vep (2012–2023): RUT, PSA, PRO, JON.
Show more

Year of publication

2024

Type of data

Authors

Raamatunkäännösinstituutti ry - Rights holder

User support FIN-CLARIN - Curator

Axelson - Creator

Jack Rueter - Creator

Project

Other information

Fields of science

Languages

Language

Veps language, Karelian, Estonian, Finnish, Hungarian language, Khanty language, Komi-Permyak language, Komi-Yodzyak language, Liv, Moksha language, Meadow Mari language, Mansi language, Hill Mari language, Erzya language, Livvi-Karelian language, Russian language, Udmurt language, Ukrainian

Open access

Open

License

Creative Commons Attribution NonCommercial 2.0 Generic (CC BY NC 2.0

Keywords

Subject headings

Temporal coverage

undefined

Related to this research data