LUT University Energy Consumption/Production Dataset
Description
The Data was collected at LUT University, Lappeenranta, Finland. Save for measurement or API errors, all variables were sampled at an hourly rate and logged using UTC timestamps. The dataset comprises:
- The aggregated energy consumed by an entire building [kW].
- The aggregated electricity generated by the PV panel array [kW].
- Day-ahead (ELSPOT) prices for the Finnish Market [€/MWh].
- This information is made publicly available by [ENTSO-e](https://newtransparency.entsoe.eu/market/energyPrices). The version found in `raw/elspot.parquet` comprises only the relevant years and uses UTC timestamps instead of local time.
- Meteorological variables measured 6 km away from campus at the Lappeenranta airport by the [Finnish Meteorological Institute](https://en.ilmatieteenlaitos.fi/) (see table below).
| Variable | Unit |
| --- | --- |
| Air Temperature | ◦C |
| Cloud Amount | Okta |
| Dew Point Temperature | ◦C |
| Global/Diffuse Radiation | W/m2 |
| Gust Speed | m/s |
| Horizontal Visibility | m |
| Pressure | hPa |
| Relative Humidity | % |
| Sunshine | % |
| Wind Direction | ◦ |
| Wind Speed | m/s |
The production and consumption columns are stored in two separate files (`raw/{consumption/production}.parquet`); thus, for the sake of consistency, the datasets were clipped to their overlapping period and joined into a single table. Discrepancies between duplicated columns arise from missing values in one of the two sources; a robust average (averaged if not null) was set as the consensus value for the redundant measurements.
The hourly timestamps were first enforced via upscaling without interpolation. A graphical analysis of the raw data revealed that the measurements are naturally split by missing values into three segments:
1. From 30.09.2017 to 30.12.2017 (2208 samples, or 11.2% of the dataset), found in `partitioned/dataset_0.parquet`.
2. From 05.02.2018 to 06.10.2018 (5856 samples after previous-day interpolation for the missing data bump in the middle of the segment, or 29.7% of the dataset), found in `partitioned/dataset_1.parquet`.
3. From 16.11.2018 to 14.03.2020 (11640 samples, or 59.1% of the dataset), found in `partitioned/dataset_2.parquet`.
Finally, the script that transforms the raw data into the partitioned tables is provided as a Jupyter Notebook (`dataset_integration.ipynb`).
Show moreYear of publication
2025
Type of data
Authors
LUT University - Rights holder
Kimmo Huoman - Creator
Project
Other information
Fields of science
Computer and information sciences; Environmental sciences; Electronic, automation and communications engineering, electronics
Language
English
Open access
Open