Text files from Gutenberg database
Description
Text files of different size and structure. More precisely, we selected random data from the Gutenberg dataset. This artefact contains five different datasets with random text files (i.e. e-books in .txt format) from the Gutenberg database. The datasets that we selected ranged from text files with a total size of 184MB to a set of text files with a total size of 1.7GB. More precisely, the following datasets can be found in this package: 1. 184MB 2. 357MB 3. 670MB 4. 1GB 5. 1.7GB In our case, we used this dataset to perform extensive experiments on regarding the performance of a Symmetric Searchable Encryption scheme. However, this dataset can be used to measure the performance of any algorithm that is parsing documents, extracting keywords, creates dictionaries etc.
Show moreYear of publication
2019
Type of data
Project
Other information
Fields of science
Computer and information sciences
Language
English
Open access
Open