Text files from Gutenberg database

Description

Text files of different size and structure. More precisely, we selected random data from the Gutenberg dataset. This artefact contains five different datasets with random text files (i.e. e-books in .txt format) from the Gutenberg database. The datasets that we selected ranged from text files with a total size of 184MB to a set of text files with a total size of 1.7GB. More precisely, the following datasets can be found in this package: 1. 184MB 2. 357MB 3. 670MB 4. 1GB 5. 1.7GB In our case, we used this dataset to perform extensive experiments on regarding the performance of a Symmetric Searchable Encryption scheme. However, this dataset can be used to measure the performance of any algorithm that is parsing documents, extracting keywords, creates dictionaries etc.

Year of publication

2019

Type of data

Authors

Tampere University

Antonis Michalas - Creator

Zenodo - Publisher

Project

Other information

Fields of science

Computer and information sciences

Language

English

Open access

Open

License

Creative Commons Attribution 4.0 International (CC BY 4.0)

Keywords