JRC-Acquis Multilingual Parallel Corpus

Description

The Acquis Communautaire (AC) is the total body of European Union (EU) law applicable in the the EU Member States. This collection of legislative text changes continuously and currently comprises selected texts written between the 1950s and now. As of the beginning of the year 2007, the EU had 27 Member States and 23 official languages. The Acquis Communautaire texts exist in these languages, although Irish translations are not currently available. The Acquis Communautaire thus is a collection of parallel texts in the following 22 languages: Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene and Swedish. The data release by the JRC is in line with the general effort of the European Commission to support multilingualism, language diversity and the re-use of Commission information. The Language Technology group of the European Commission's Joint Research Centre did not receive an authoritative list of documents that belong to the Acquis Communautaire. In order to compile the document collection distributed here, we selected all those CELEX documents (see below) that were available in at least ten of the twenty EU-25 languages (the official languages of the EU before Bulgaria and Romania joined in 2007) and that additionally existed in at least three of the nine languages that became official languages with the Enlargement of the EU in 2004 (i.e. Czech, Estonian, Hungarian, Lithuanian, Latvian, Maltese, Polish, Slovak and Slovene). The collection distributed here is thus an approximation of the Acquis Communautaire which we call the JRC-Acquis. The JRC-Acquis must not be seen as a legal reference corpus. Instead, the purpose of the JRC-Acquis is to provide a large parallel corpus of documents for (computational) linguistics research purposes. Terms and conditions: https://ec.europa.eu/jrc/en/language-technologies/jrc-acquis#Usage conditions / Licensing issues
Show more

Year of publication

2018

Type of data

Authors

European Commission - Joint Research Centre

Ralf Steinberger - Publisher, Curator, Creator

Project

Other information

Fields of science

Languages

Language

Bulgarian language, Czech language, Danish language, German, Greek, Modern (1453-), English, Estonian, Finnish, French, Hungarian language, Italian, Latvian, Lithuanian language, Maltese language, Dutch, Polish, Portuguese, Romanian language, Slovak language, Slovene language, Spanish, Swedish

Open access

Restricted access

License

Other

Keywords

Subject headings

Temporal coverage

undefined

Related to this research data