Uralic, Turkic, Indo-Iranian and Mongol languages; languages of Siberia and Caucasia (UHLCS)
Description
The corpus is available in Kielipankki - the Language Bank of Finland (puhti.csc.fi, access rights instructions: http://www.kielipankki.fi/access).
Location: appl/data/kielipankki/mrc-uhlcs/multilingual-language-archive/
Contains texts in Chukchi, Koryak, Kurdish, Ossete, Tajik, Avar, Lak, Tabassaran, Kalmyk, Even, Evenki, Nanay, as well as in various Uralic and Turkic languages.
Here is a list of all the languages included in alphabetical order (with information about the location subfolders):
Avar (location: north-east-caucasian-lgs/avar-andi-tsez-lgs/avar/)
Azerbaijani (location: turkic-lgs/south-west-turkic-lgs/azerbaijani/)
Balkar (location: turkic-lgs/north-west-turkic-lgs/balkar/)
Bashkir (location: turkic-lgs/north-west-turkic-lgs/bashkir/)
Chukchi (location: chukotko-kamchatkan-lgs/chukchi/)
Chuvash (location: turkic-lgs/bolgar-group/chuvash/)
Crimean-Turkish (location: turkic-lgs/north-west-turkic-lgs/crimean-turkish/)
Dvina-Karelian (location: uralic-lgs/finno-ugric-lgs/baltic-finnic-lgs/dvina-karelian/)
Eastern & Meadow Mari (location: uralic-lgs/finno-ugric-lgs/mari-lgs/eastern-mari/)
Enets (location: uralic-lgs/samoyedic-lgs/enets/)
Erzya (location: uralic-lgs/finno-ugric-lgs/mordvin-lgs/erzya/)
Even (location: tungusic-lgs/north-tungusic-lgs/even/)
Evenki (location: tungusic-lgs/north-tungusic-lgs/evenki/)
Gagauz (location: turkic-lgs/south-west-turkic-lgs/gagauz/)
Hill Mari (location: uralic-lgs/finno-ugric-lgs/mari-lgs/western-mari/)
Kalmyk (location: mongolic-lgs/east-mongolic-lgs/kalmyk/)
Kamas (location: uralic-lgs/samoyedic-lgs/kamas/)
Khakas (location: turkic-lgs/north-east-turkic-lgs/khakas/)
Kildin-Saami (location: uralic-lgs/finno-ugric-lgs/saami-lgs/kildin-saami/)
Kirghiz (location: turkic-lgs/north-west-turkic-lgs/kirghiz/)
Komi-Permyak (location: uralic-lgs/finno-ugric-lgs/permic-lgs/komi/permyak/)
Koryak (location: chukotko-kamchatkan-lgs/koryak/)
Kurdish (location: indo-european-lgs/iranian-lgs/west-iranian-lgs/kurdish/)
Lak (location: north-east-caucasian-lgs/lak-dargva-lgs/lak/)
Livonian (location: uralic-lgs/finno-ugric-lgs/baltic-finnic-lgs/livonian/)
Mansi (location: uralic-lgs/finno-ugric-lgs/ugric-lgs/mansi/)
Moksha (location: uralic-lgs/finno-ugric-lgs/mordvin-lgs/moksha/)
Nanay (location: tungusic-lgs/south-tungusic-lgs/nanay/)
Olonets-Karelian aka Livvi (location: uralic-lgs/finno-ugric-lgs/baltic-finnic-lgs/livvi/)
Ossete (location: indo-european-lgs/iranian-lgs/east-iranian-lgs/ossete/)
Selkup (location: uralic-lgs/samoyedic-lgs/selkup/)
Tabassaran (location: north-east-caucasian-lgs/lezgian-lgs/tabassaran/)
Tajik (location: indo-european-lgs/iranian-lgs/west-iranian-lgs/tajik/)
Tatar (location: turkic-lgs/north-west-turkic-lgs/tatar/)
Turkmen (location: turkic-lgs/south-west-turkic-lgs/turkmen/)
Tuvin (location: turkic-lgs/north-east-turkic-lgs/tuvin/)
Udmurt (location: uralic-lgs/finno-ugric-lgs/permic-lgs/udmurt/)
Uigur (location: turkic-lgs/south-east-turkic-lgs/uighur/)
Uzbek (location: turkic-lgs/south-east-turkic-lgs/uzbek/)
Veps (location: uralic-lgs/finno-ugric-lgs/baltic-finnic-lgs/veps/)
Yakut aka Sakha (location: turkic-lgs/north-east-turkic-lgs/yakut/)
The corpus is a part of the Multilingual Resource Collection of the UHLCS.
UHLCS has many different IPR holders. Should you have any questions regarding the collection, please contact Pirkko Suihkonen (suihkonen.pirkko@gmail.com).
The purpose of the resource use must be outlined in a research plan.
Show moreYear of publication
2025
Authors
User support at CSC - IT Center for Science Ltd. The Language Bank of Finland - Curator
Multiple publishers, check distribution rights holders in original metadata by following its persistent identifier - Publisher
Pirkko Suihkonen - Creator, Rights holder
Other information
Fields of science
Languages
Language
Avar language, Chukchi language, Even language, Evenki language, Nanai language, Koryak language, Kurdish language, Lak language, Ossetic language, Tabasaran language, Tajik language, Kalmyk Oirat
Open access
Restricted access