Boosting Predictability: Towards Rapid Estimation of Organic Molecule Solubility

Description

Machine Learning for Predicting Solubility The water solubility of organic molecules is critical for optimizing the performance and stability of aqueous flow batteries, as well as for various other applications. While solubility measurements are relatively straightforward in some cases, theoretical predictions remain a significant challenge. Machine learning algorithms have become invaluable tools over the past decade to address this. High-quality data and effective descriptors are essential for building reliable, data-driven estimation models. This repository systematically investigates the effectiveness of enhanced structure-based descriptors and an outlier detection procedure to improve aqueous solubility predictability. Installation Clone the repository: git clone git@github.com:sahashemip/ML4OrganicMoleculeSolubility.git Navigate to the project directory: cd ML4OrganicMoleculeSolubility Install the required dependencies: pip install -r requirements.txt How to Use Navigate to the notebooks directory: Open and run the Jupyter notebooks sequentially based on the numbering: analysis descriptors ml-models outlier-detection Outlier Detection: To perform outlier detection, modify the parameters in the outlier_detector.py script. Refer to the data in TABLE I of the associated manuscript for parameter details. Project Structure notebooks/: Contains step-by-step Jupyter notebooks for analysis, feature engineering, and model development. scripts/: Includes Python scripts for outlier detection and custom preprocessing utilities. datasets/: Holdes all different datasets generated by distinct descriptors. outliers/: Stores outputs related to the detected outliers.
Show more

Year of publication

2025

Type of data

Authors

Department of Applied Physics

Arsalan Hashemi Orcid -palvelun logo - Contributor, Creator

Kari Laasonen Orcid -palvelun logo - Contributor

Pekka Peljo Orcid -palvelun logo - Contributor

Tapio Ala-Nissila Orcid -palvelun logo - Contributor

Zenodo - Publisher

Project

Other information

Fields of science

Physical sciences

Language

Open access

Open

License

Creative Commons Attribution 4.0 International (CC BY 4.0)

Keywords

Subject headings

Temporal coverage

undefined

Related to this research data