Benchmark Dataset for Mid-Price Forecasting of Limit Order Book Data with Machine Learning Methods

Description

LOB-dataset ## Synopsis Here we provide the normalized datasets as .txt files. The datasets are divided into two main categories: datasets that include the auction period and datasets that do not. For each of these two categories we provide three normalization set-ups based on z-score, min-max, and decimal-precision normalization. Since we followed the anchored cross-validation method for 10 days for 5 stocks, the user can find nine (cross-fold) datasets for each normalization set-up for training and testing. Every training and testing dataset contains information for all the stocks. For example, the first fold contains one-day of training and one-day of testing for all the five stocks. The second fold contains the training dataset for two days and the testing dataset for one day. The two-days information the training dataset has is the training and testing from the first fold and so on. The title of the .txt files contains the information in the following order: 1. training or testing set 2. with or without auction period 3. type of the normalization setup 4. fold number (from 1 to 9) based on the above cross-validation method ATTENTION: The given files contain both the feature set and the labels. From row 1 to row 144 we provide the features (see 'Benchmark Dataset for Mid-Price Prediction of Limit Order Book Data' for the description) and from row 145 to row 149 we provide labels for 5 classification problems. Labels (row 145 to the end) have the following explanation ‘1’ is for up-movement, ‘2’ is for stationary condition and ‘3’ is for down-movement. ## Motivation These are the first publicly available datasets that contain representations and annotations for a limit order book (LOB) in the High Frequency Trading universe. ## Tests We provide baselines for these datasets based on linear and non-linear regression methods. ## Acknowledgment The research leading to these results has received funding from the H2020 Project BigDataFinance MSCA-ITN-ETN 675044 (http://bigdatafinance.eu), Training for Big Data in Financial Research and Risk Management.

Year of publication

2019

Type of data

Authors

Tampere University

Adamantios Ntakaris - Creator, Curator

Alexandros Iosifidis - Creator, Curator

Juho Kanniainen - Creator, Curator

Martin Magris - Creator, Curator

Moncef Gabbouj - Creator, Curator

Project

Other information

Fields of science

Language

English

Open access

Open

License

Creative Commons Attribution 4.0 International (CC BY 4.0)

Keywords

Subject headings

Temporal coverage

undefined