Bases: BaseDataset
tmQM dataset contains the geometries of a large transition metal-organic compound space with a large variety of
organic ligands and 30 transition metals. It contains energy labels for 86,665 mononuclear complexes calculated
at the TPSSh-D3BJ/def2-SV DFT level of theory. Structures are first extracted from Cambridge Structure Database
and then optimized in gas phase with the extended tight-binding GFN2-xTB method.
Usage:
from openqdc.datasets import TMQM
dataset = TMQM()
References
https://pubs.acs.org/doi/10.1021/acs.jcim.0c01041
https://github.com/bbskjelstad/tmqm
Source code in openqdc/datasets/potential/tmqm.py
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89 | class TMQM(BaseDataset):
"""
tmQM dataset contains the geometries of a large transition metal-organic compound space with a large variety of
organic ligands and 30 transition metals. It contains energy labels for 86,665 mononuclear complexes calculated
at the TPSSh-D3BJ/def2-SV DFT level of theory. Structures are first extracted from Cambridge Structure Database
and then optimized in gas phase with the extended tight-binding GFN2-xTB method.
Usage:
```python
from openqdc.datasets import TMQM
dataset = TMQM()
```
References:
https://pubs.acs.org/doi/10.1021/acs.jcim.0c01041\n
https://github.com/bbskjelstad/tmqm
"""
__name__ = "tmqm"
__energy_methods__ = [PotentialMethod.TPSSH_DEF2_TZVP] # "tpssh/def2-tzvp"]
energy_target_names = ["TPSSh/def2TZVP level"]
__energy_unit__ = "hartree"
__distance_unit__ = "ang"
__forces_unit__ = "hartree/ang"
__links__ = {
x: f"https://raw.githubusercontent.com/bbskjelstad/tmqm/master/data/{x}"
for x in ["tmQM_X1.xyz.gz", "tmQM_X2.xyz.gz", "tmQM_y.csv", "Benchmark2_TPSSh_Opt.xyz"]
}
def read_raw_entries(self):
df = pd.read_csv(p_join(self.root, "tmQM_y.csv"), sep=";", usecols=["CSD_code", "Electronic_E"])
e_map = dict(zip(df["CSD_code"], df["Electronic_E"]))
raw_fnames = ["tmQM_X1.xyz", "tmQM_X2.xyz", "Benchmark2_TPSSh_Opt.xyz"]
samples = []
for fname in raw_fnames:
data = read_xyz(p_join(self.root, fname), e_map)
samples += data
return samples
|