LUT University Energy Consumption/Production Dataset

Description

The Data was collected at LUT University, Lappeenranta, Finland. Save for measurement or API errors, all variables were sampled at an hourly rate and logged using UTC timestamps. The dataset comprises: - The aggregated energy consumed by an entire building [kW]. - The aggregated electricity generated by the PV panel array [kW]. - Day-ahead (ELSPOT) prices for the Finnish Market [€/MWh]. - This information is made publicly available by [ENTSO-e](https://newtransparency.entsoe.eu/market/energyPrices). The version found in `raw/elspot.parquet` comprises only the relevant years and uses UTC timestamps instead of local time. - Meteorological variables measured 6 km away from campus at the Lappeenranta airport by the [Finnish Meteorological Institute](https://en.ilmatieteenlaitos.fi/) (see table below). | Variable | Unit | | --- | --- | | Air Temperature | ◦C | | Cloud Amount | Okta | | Dew Point Temperature | ◦C | | Global/Diffuse Radiation | W/m2 | | Gust Speed | m/s | | Horizontal Visibility | m | | Pressure | hPa | | Relative Humidity | % | | Sunshine | % | | Wind Direction | ◦ | | Wind Speed | m/s | The production and consumption columns are stored in two separate files (`raw/{consumption/production}.parquet`); thus, for the sake of consistency, the datasets were clipped to their overlapping period and joined into a single table. Discrepancies between duplicated columns arise from missing values in one of the two sources; a robust average (averaged if not null) was set as the consensus value for the redundant measurements. The hourly timestamps were first enforced via upscaling without interpolation. A graphical analysis of the raw data revealed that the measurements are naturally split by missing values into three segments: 1. From 30.09.2017 to 30.12.2017 (2208 samples, or 11.2% of the dataset), found in `partitioned/dataset_0.parquet`. 2. From 05.02.2018 to 06.10.2018 (5856 samples after previous-day interpolation for the missing data bump in the middle of the segment, or 29.7% of the dataset), found in `partitioned/dataset_1.parquet`. 3. From 16.11.2018 to 14.03.2020 (11640 samples, or 59.1% of the dataset), found in `partitioned/dataset_2.parquet`. Finally, the script that transforms the raw data into the partitioned tables is provided as a Jupyter Notebook (`dataset_integration.ipynb`).

Year of publication

2025

Type of data

Authors

LUT University - Rights holder

Lasse Lensu - Contributor

Samuli Honkapuro - Contributor

Computational Engineering

Sergio Mauricio Vanegas Arias - Contributor, Publisher, Curator

Kimmo Huoman - Creator

Ville Tikka - Creator

Project

Other information

Fields of science

Computer and information sciences; Environmental sciences; Electronic, automation and communications engineering, electronics

Language

English

Open access

Open

License

Creative Commons Attribution ShareAlike 4.0 International (CC BY SA 4.0)

Keywords

Weather observations, weather, forecasting, time series, Solar Panels, Building, Electricity Production, PV Solar, Electricity Consumption

Subject headings

Temporal coverage

undefined