Lists of Words Corpus (UHLCS)
Description
The corpus is available in Kielipankki - the Language Bank of Finland (puhti.csc.fi, access rights instructions: http://www.kielipankki.fi/access).
Location: /appl/data/kielipankki/mrc-uhlcs/general-linguistics/multilingual-data/words/
The lists of words were generated from the corpora of the following languages:
* Dutch: 178,430 words, 1,998,881 characters
* Finnish: proper names: 714 names, 4,488 characters; general list of words: 264,654 words, 3,171,148 characters
* French: 138,257 words, 1,524,757 characters
* German: 160,086 words, 2,060,734 characters
* Italian: 60,453 words, 561,982 characters
* Norwegian: 61,843 words, 589,234 characters
* Swedish: 13,328 words, 117,685 characters
Type of the documents: words in alphabetic order.
Character encoding: ASCII.
The lists of words were compiled at the University of Helsinki, Department of General Linguistics. The Lists of Words Corpus is a part of the UHLCS corpus collection.
UHLCS has many different IPR holders. Should you have any questions regarding the collection, please contact Pirkko Suihkonen (suihkonen.pirkko@gmail.com).
License details: http://urn.fi/urn:nbn:fi:lb-2015041002
The purpose of the resource use must be outlined in a research plan.
Show moreYear of publication
2018
Authors
User support at CSC - IT Center for Science Ltd. The Language Bank of Finland - Curator
Multiple publishers, check distribution rights holders in original metadata by following its persistent identifier - Publisher
Pirkko Suihkonen - Rights holder, Creator
Other information
Fields of science
Languages
Language
German, Finnish, French, Italian, Dutch, Norwegian language, Swedish
Open access
Restricted access