Bases: BaseDataset
BPA (or 3BPA) dataset is a dataset consisting of a flexible druglike
molecule 3-(benzyloxy)pyridin-2-amine. This dataset features
complex dihedral potential energy surface with many local minima,
which can be challenging to approximate using classical or ML force fields.
The configuration were sampled from short (0.5 ps) MD simulations using the ANI-1x force field to
perturb the toward lower potential energies. Furthermore, long 25 ps MD simulation were performed at
three different temperatures (300, 600, and 1200 K) using the Langevin thermostat and a 1 fs time step.
The final configurations were re-evaluated using ORCA at the DFT level of
theory using the ωB97X exchange correlation functional and the 6-31G(d) basis set.
Usage:
from openqdc.datasets import BPA
dataset = BPA()
References
https://pubs.acs.org/doi/10.1021/acs.jctc.1c00647
Source code in openqdc/datasets/potential/bpa.py
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74 | class BPA(BaseDataset):
"""
BPA (or 3BPA) dataset is a dataset consisting of a flexible druglike
molecule 3-(benzyloxy)pyridin-2-amine. This dataset features
complex dihedral potential energy surface with many local minima,
which can be challenging to approximate using classical or ML force fields.
The configuration were sampled from short (0.5 ps) MD simulations using the ANI-1x force field to
perturb the toward lower potential energies. Furthermore, long 25 ps MD simulation were performed at
three different temperatures (300, 600, and 1200 K) using the Langevin thermostat and a 1 fs time step.
The final configurations were re-evaluated using ORCA at the DFT level of
theory using the ωB97X exchange correlation functional and the 6-31G(d) basis set.
Usage:
```python
from openqdc.datasets import BPA
dataset = BPA()
```
References:
https://pubs.acs.org/doi/10.1021/acs.jctc.1c00647
"""
__name__ = "BPA"
__energy_unit__ = "ev"
__forces_unit__ = "ev/ang"
__distance_unit__ = "ang"
__force_mask__ = [True]
__energy_methods__ = [PotentialMethod.WB97X_6_31G_D]
__links__ = {"BPA.zip": "https://figshare.com/ndownloader/files/31325990"}
def read_raw_entries(self) -> List[Dict]:
import os.path as osp
from glob import glob
from ase.io import iread
files = glob(osp.join(self.root, "dataset_3BPA", "*.xyz"))
files = [f for f in files if "iso_atoms.xyz" not in f]
all_records = []
for file in files:
subset = np.array([osp.basename(file).split(".")[0]])
for atoms in iread(file, format="extxyz"):
all_records.append(read_bpa_record(subset, atoms))
return all_records
def __getitem__(self, idx):
data = super().__getitem__(idx)
data.__setattr__("split", self._convert_array(self.data["split"][idx]))
return data
|