Lists of Words Corpus (UHLCS)

Lists of Words Corpus (UHLCS)

Description

The corpus is available in Kielipankki - the Language Bank of Finland (puhti.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/data/kielipankki/mrc-uhlcs/general-linguistics/multilingual-data/words/ The lists of words were generated from the corpora of the following languages: * Dutch: 178,430 words, 1,998,881 characters * Finnish: proper names: 714 names, 4,488 characters; general list of words: 264,654 words, 3,171,148 characters * French: 138,257 words, 1,524,757 characters * German: 160,086 words, 2,060,734 characters * Italian: 60,453 words, 561,982 characters * Norwegian: 61,843 words, 589,234 characters * Swedish: 13,328 words, 117,685 characters Type of the documents: words in alphabetic order. Character encoding: ASCII. The lists of words were compiled at the University of Helsinki, Department of General Linguistics. The Lists of Words Corpus is a part of the UHLCS corpus collection. UHLCS has many different IPR holders. Should you have any questions regarding the collection, please contact Pirkko Suihkonen (suihkonen.pirkko@gmail.com). License details: http://urn.fi/urn:nbn:fi:lb-2015041002 The purpose of the resource use must be outlined in a research plan.
Show more

Year of publication

2018

Authors

User support at CSC - IT Center for Science Ltd. The Language Bank of Finland - Curator

Multiple publishers, check distribution rights holders in original metadata by following its persistent identifier - Publisher

Pirkko Suihkonen - Rights holder, Creator

Other information

Fields of science

Languages

Language

German, Finnish, French, Italian, Dutch, Norwegian language, Swedish

Open access

Restricted access

License

CLARIN RES (Restricted) End User License 1.0
Lists of Words Corpus (UHLCS) - Research.fi