Adsorption capacity evolution in the literature
Description
This dataset contains a list of articles (total 11664) extracted from the Scopus database concerning adsorption of ammonium, arsenic, lead, methylene blue, or nitrate from water. Using, the freely available abstracts, adsorption capacities and material class of the adsorbent was tabulated using a large language model.
Literature was searched from the Scopus database using the search phrase ((adsorption OR adsorbent OR sorption OR sorbent) AND "adsorbate" AND "mg/g") where “adsorbate” was replaced by “methylene blue”, “arsenic”, “(ammonium OR NH4+)”, “(nitrate OR NO3-)“, or “(lead OR Pb). The search was limited to abstracts, document type as articles, and source type as journals. The search results (11,664 articles) were exported from the Scopus database as comma separated value (CSV) files.
To estimate the coverage of the literature, another search was conducted by removing the “mg/g” from the search phrase and combining all the adsorbates into one search phrase (i.e., the new search phrase was: “(adsorption OR adsorbent OR sorption OR sorbent) AND ("methylene blue" OR arsenic OR ammonium OR NH4+ OR nitrate OR NO3- OR lead OR Pb)”) applying similar exclusion criteria (i.e., publication year until 2023 and language in English). This search returned 106,120 articles, and thus the coverage is approx. 11% of all articles addressing the selected adsorbates.
The collected text corpus was sequentially analyzed by an AI solution called Aida (developed by AI4Value Oy, Finland). First, the information for each material was extracted from the abstract into a structured format containing the name of the adsorbent, name of the adsorbate, adsorption capacity, and unit. If several materials were mentioned, only the material with the highest capacity was selected. Finally, the categories of the best adsorbent materials were deduced from the abstract into predefined 23 different classes (i.e., carbon nanotubes, biomass/biopolymers, metal oxides, metal-organic frameworks, metal hydroxides, hydrogels, other natural minerals, graphene oxides, covalent organic frameworks, metal carbonates, biochars, zeolites, clay minerals, Mxenes, other aluminosilicates, hydroxyapatites, other unclassified, other organic polymers, commercial ion-exchange resins, Mo sulfides, activated carbons, layered double hydroxides, and silica). GPT 3.5 turbo was selected as the large language model in the Aida software for this task.
The spreadsheets containing the adsorption capacities were then manually checked to remove clearly false results (e.g., wrong adsorbate or some other parameter than adsorption capacity retrieved). The results in other units than mg g-1 (e.g., µg g-1 or mmol g-1) were manually converted to mg g-1. The adsorption capacities were classified per year of publication for analysis.
The reliability of the results was verified by selecting randomly 100 studies from the spreadsheet and manually finding the maximum adsorption capacities from their abstracts. When comparing the results to the ones obtained by Aida, 98% of the results were similar. In the 2%, the interpretation of the abstract in terms of the reported maximum adsorption capacity was not unambiguous (e.g., there were several numbers mentioned, and the sentence structure was unclear). Based on this validation, the results can be considered very accurate for statistical analyses.
Show moreYear of publication
2024
Authors
AI4Value Oy
Juhani Teeriniemi - Creator
Other information
Open access
Open
License
Creative Commons Attribution 4.0 International (CC BY 4.0)
Keywords
water treatment, wastewater treatment, adsorption, AI, large language models