Benchmark Dataset for Mid-Price Forecasting of Limit Order Book Data with Machine Learning Methods
Description
LOB-dataset
## Synopsis
Here we provide the normalized datasets as .txt files. The datasets are divided into two main categories: datasets that include the auction period and datasets that do not. For each of these two categories we provide three normalization set-ups based on z-score, min-max, and decimal-precision normalization. Since we followed the anchored cross-validation method for 10 days for 5 stocks, the user can find nine (cross-fold) datasets for each normalization set-up for training and testing. Every training and testing dataset contains information for all the stocks. For example, the first fold contains one-day of training and one-day of testing for all the five stocks. The second fold contains the training dataset for two days and the testing dataset for one day. The two-days information the training dataset has is the training and testing from the first fold and so on.
The title of the .txt files contains the information in the following order:
1. training or testing set
2. with or without auction period
3. type of the normalization setup
4. fold number (from 1 to 9) based on the above cross-validation method
ATTENTION: The given files contain both the feature set and the labels. From row 1 to row 144 we provide the features (see 'Benchmark Dataset for Mid-Price Prediction of Limit Order Book Data' for the description) and from row 145 to row 149 we provide labels for 5 classification problems. Labels (row 145 to the end) have the following explanation ‘1’ is for up-movement, ‘2’ is for stationary condition and ‘3’ is for down-movement.
## Motivation
These are the first publicly available datasets that contain representations and annotations for a limit order book (LOB) in the High Frequency Trading universe.
## Tests
We provide baselines for these datasets based on linear and non-linear regression methods.
## Acknowledgment
The research leading to these results has received funding from the H2020 Project BigDataFinance MSCA-ITN-ETN 675044 (http://bigdatafinance.eu), Training for Big Data in Financial Research and Risk Management.
Show moreYear of publication
2019
Authors
Adamantios Ntakaris - Creator, Curator
Alexandros Iosifidis - Creator, Curator
Juho Kanniainen - Creator, Curator
Martin Magris - Creator, Curator
Other information
Language
English
Open access
Open
License
Creative Commons Attribution 4.0 International (CC BY 4.0)