High Performance Language Technologies
Acronym
HPLT
Description of the granted funding
High Performance Language Technologies (HPLT) is a space combining petabytes of natural language data with large-scale model training. With trillions of words of text, the space will be the largest open text collection. Cleaning and privacy protecting services improve the quality and ethical properties of the text. Going beyond static repositories that require the user to individually analyze each data set, the project will rate data sets by how much they improve end-to-end language models and machine translation systems. Continuous integration of models and data will result in free downloadable high-quality models for all official European Union languages and beyond. The models will be reproducible with information and evaluation metrics shown in a publicly available dashboard. By focusing on training at scale, the project complements the inference-focused European Language Grid, which in turn will be used for model deployment. Datasets, models and information about them will be published in recognized FAIR data repositories, aggregation catalogues and marketplaces for easy discovery, access, replication, and exploitation.
Show moreStarting year
2022
End year
2025
Granted funding
UNINETT SIGMA2 AS (NO)
408 750 €
Participant
PROMPSIT LANGUAGE ENGINEERING, SL (ES)
414 400 €
Participant
CESNET ZAJMOVE SDRUZENI PRAVNICKYCH OSOB (CZ)
415 000 €
Participant
UNIVERZITA KARLOVA (CZ)
641 812.5 €
Coordinator
THE UNIVERSITY OF EDINBURGH (UK)
Participant
UNIVERSITETET I OSLO (NO)
717 100 €
Participant
Amount granted
3 880 688 €
Funder
European Union
Funding instrument
HORIZON Innovation Actions
Framework programme
Horizon Europe (HORIZON)
Call
Programme part
Digital, Industry and Space (11704 Advanced Computing and Big Data (11711 )
Topic
Technologies for data management (AI, Data and Robotics Partnership) (IA) (HORIZON-CL4-2021-DATA-01-03Call ID
HORIZON-CL4-2021-DATA-01 Other information
Funding decision number
101070350
Identified topics
languages, language policy