Bases: BaseDataset
ISO17 dataset consists of the largest set of isomers from the QM9 dataset that consists of a fixed composition of
atoms (C7O2H10) arranged in different chemically valid structures. It consist of 129 molecules, each containing
5,000 conformational geometries, energies and forces with a resolution of 1 fs in the molecular dynamics
trajectories. The simulations were carried out using density functional theory (DFT) in the generalized gradient
approximation (GGA) with the Perdew-Burke-Ernzerhof (PBE) functional and the Tkatchenko-Scheffler (TS) van der
Waals correction method.
Usage:
from openqdc.datasets import ISO17
dataset = ISO17()
References
https://arxiv.org/abs/1706.08566
https://arxiv.org/abs/1609.08259
https://www.nature.com/articles/sdata201422
https://pubmed.ncbi.nlm.nih.gov/10062328/
https://pubmed.ncbi.nlm.nih.gov/19257665/
Source code in openqdc/datasets/potential/iso_17.py
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62 | class ISO17(BaseDataset):
"""
ISO17 dataset consists of the largest set of isomers from the QM9 dataset that consists of a fixed composition of
atoms (C7O2H10) arranged in different chemically valid structures. It consist of 129 molecules, each containing
5,000 conformational geometries, energies and forces with a resolution of 1 fs in the molecular dynamics
trajectories. The simulations were carried out using density functional theory (DFT) in the generalized gradient
approximation (GGA) with the Perdew-Burke-Ernzerhof (PBE) functional and the Tkatchenko-Scheffler (TS) van der
Waals correction method.
Usage:
```python
from openqdc.datasets import ISO17
dataset = ISO17()
```
References:
https://arxiv.org/abs/1706.08566\n
https://arxiv.org/abs/1609.08259\n
https://www.nature.com/articles/sdata201422\n
https://pubmed.ncbi.nlm.nih.gov/10062328/\n
https://pubmed.ncbi.nlm.nih.gov/19257665/
"""
__name__ = "iso_17"
__energy_methods__ = [
PotentialMethod.PBE_DEF2_TZVP, # "pbe/def2-tzvp",
]
energy_target_names = [
"PBE-TS Energy",
]
__force_mask__ = [True]
force_target_names = [
"PBE-TS Gradient",
]
__energy_unit__ = "ev"
__distance_unit__ = "ang"
__forces_unit__ = "ev/ang"
__links__ = {"iso_17.hdf5.gz": "https://zenodo.org/record/3585907/files/216.hdf5.gz"}
def __smiles_converter__(self, x):
"""util function to convert string to smiles: useful if the smiles is
encoded in a different format than its display format
"""
return "-".join(x.decode("ascii").split("_")[:-1])
def read_raw_entries(self):
raw_path = p_join(self.root, "iso_17.h5.gz")
samples = read_qc_archive_h5(raw_path, "iso_17", self.energy_target_names, self.force_target_names)
return samples
|
__smiles_converter__(x)
util function to convert string to smiles: useful if the smiles is
encoded in a different format than its display format
Source code in openqdc/datasets/potential/iso_17.py
| def __smiles_converter__(self, x):
"""util function to convert string to smiles: useful if the smiles is
encoded in a different format than its display format
"""
return "-".join(x.decode("ascii").split("_")[:-1])
|