Bases: BaseDataset
Orbnet Denali is a collection of 2.3 million conformers from 212,905 unique molecules. Molecules include a range
of organic molecules with protonation and tautomeric states, non-covalent interactions, common salts, and
counterions, spanning the most common elements in bio and organic chemistry. Geometries are generated in 2 steps.
First, four energy-minimized conformations are generated for each molecule using the ENTOS BREEZE conformer
generator. Second, using the four energy-minimized conformers, non-equilibrium geometries are generated using
normal mode sampling at 300K or ab initio molecular dynamics (AIMD) for 200fs at 500K; using GFN1-xTB level of
theory. Energies are calculated using DFT method wB97X-D3/def2-TZVP and semi-empirical method GFN1-xTB level of
theory.
Usage:
from openqdc.datasets import OrbnetDenali
dataset = OrbnetDenali()
References
https://arxiv.org/abs/2107.00299
https://figshare.com/articles/dataset/OrbNet_Denali_Training_Data/14883867
Source code in openqdc/datasets/potential/orbnet_denali.py
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84 | class OrbnetDenali(BaseDataset):
"""
Orbnet Denali is a collection of 2.3 million conformers from 212,905 unique molecules. Molecules include a range
of organic molecules with protonation and tautomeric states, non-covalent interactions, common salts, and
counterions, spanning the most common elements in bio and organic chemistry. Geometries are generated in 2 steps.
First, four energy-minimized conformations are generated for each molecule using the ENTOS BREEZE conformer
generator. Second, using the four energy-minimized conformers, non-equilibrium geometries are generated using
normal mode sampling at 300K or ab initio molecular dynamics (AIMD) for 200fs at 500K; using GFN1-xTB level of
theory. Energies are calculated using DFT method wB97X-D3/def2-TZVP and semi-empirical method GFN1-xTB level of
theory.
Usage:
```python
from openqdc.datasets import OrbnetDenali
dataset = OrbnetDenali()
```
References:
https://arxiv.org/abs/2107.00299\n
https://figshare.com/articles/dataset/OrbNet_Denali_Training_Data/14883867
"""
__name__ = "orbnet_denali"
__energy_methods__ = [
PotentialMethod.WB97X_D3_DEF2_TZVP,
PotentialMethod.GFN1_XTB,
] # ["wb97x-d3/def2-tzvp", "gfn1_xtb"]
energy_target_names = ["dft_energy", "xtb1_energy"]
__energy_unit__ = "hartree"
__distance_unit__ = "ang"
__forces_unit__ = "hartree/ang"
__links__ = {
"orbnet_denali.tar.gz": "https://figshare.com/ndownloader/files/28672287",
"orbnet_denali_targets.tar.gz": "https://figshare.com/ndownloader/files/28672248",
}
def read_raw_entries(self):
label_path = p_join(self.root, "denali_labels.csv")
df = pd.read_csv(label_path, usecols=["sample_id", "mol_id", "subset", "dft_energy", "xtb1_energy"])
labels = {
mol_id: group.drop(["mol_id"], axis=1).drop_duplicates("sample_id").set_index("sample_id").to_dict("index")
for mol_id, group in df.groupby("mol_id")
}
fn = lambda x: read_archive(x[0], x[1], self.root, self.energy_target_names)
res = dm.parallelized(fn, list(labels.items()), scheduler="threads", n_jobs=-1, progress=True)
samples = sum(res, [])
return samples
|