Dataset supporting the Machine Learning-Assisted Clustering of Amino Acids project

Description

In the Machine Learning-Assisted Clustering of Amino Acids study, a model of the interactions observed in the peptide-AuNC interface was developed. The completion of this work was accompanied by the generation and production of data, and the description of which can be found in this metadata repository. In order to perform the clustering of molecular structures related to the interaction or non-interaction, the SMILES strings of amino acids/peptides were generated using GenPep. The datasets pertaining to amino acids and peptides, SMILES strings and codes with two and three amino acids, in conjunction with the structures with protonation state related to pH 7, are deposited in the folder entitled "1.data_generation". In order to validate the model, molecular dynamics simulations using software GROMACS were performed. The geometrical clustering of each simulation, containing the coordinates of the structures with higher population, in order, is deposited in the folder entitled "2.geometrical_clusters". In addition, to validate the model, structures were obtained through PBE-optimization with DFT calculations using software GPAW. The coordinates are deposited in a folder entitled "3.DFT_coordinates". A comprehensive description of the methodologies employed in the generation and production of these data is provided in the forthcoming publication. The dataset is available at: https://nextcloud.jyu.fi/index.php/s/XFdkXR9njeboCSW
Show more

Year of publication

2025

Type of data

Authors

Kemian laitos

de Souza Ferrari, Brenda Orcid -palvelun logo - Rights holder, Creator

Fysiikan laitos

Fallah, Zohreh Orcid -palvelun logo - Creator

Häkkinen, Hannu - Creator

Khatun, Maya Orcid -palvelun logo - Creator

Project

Other information

Fields of science

Computer and information sciences; Physical sciences; Chemical sciences; Biochemistry, cell and molecular biology

Language

English

Open access

Embargo

License

Creative Commons Attribution 4.0 International (CC BY 4.0)

Keywords

Molecular Dynamics, density functional theory, machine learning, nanoclusters

Subject headings

Temporal coverage

undefined

Related to this research data