Skip to content

QM7X

QM7X

Bases: BaseDataset

QM7X is a collection of almost 4.2 million conformers from 6,950 unique organic molecules. The molecules with up to seven heavy (C, N, O, S, Cl) atoms are considered from the GDB13 database. For generating conformations, OpenBabel is utilized to get an initial structure using the MMFF94 force field. Using the initial structure, meta- stable conformational isomers are generated using the Confab tool along with the MMFF94 force field. The structure is then re-optimized with density-functional tight binding (DFTB) supplemented with many-body dispersion (MBD) interactions. The lowest energy structure is then considered as the final equilibrium conformer. Additionally, non -equilibrium conformations are generated by displacing the equilibrium geometry along a linear combination of normal mode coordinates computed at the DFTB3-MBD level within the harmonic approximation. The dataset has energy values for each geometry computed at PBE0-MBD and DFTB3-MBD method.

Usage:

from openqdc.datasets import QM7X
dataset = QM7X()

References

https://arxiv.org/abs/2006.15139

https://zenodo.org/records/4288677

Source code in openqdc/datasets/potential/qm7x.py
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
class QM7X(BaseDataset):
    """
    QM7X is a collection of almost 4.2 million conformers from 6,950 unique organic molecules. The molecules with
    up to seven heavy (C, N, O, S, Cl) atoms are considered from the GDB13 database. For generating conformations,
    OpenBabel is utilized to get an initial structure using the MMFF94 force field. Using the initial structure, meta-
    stable conformational isomers are generated using the Confab tool along with the MMFF94 force field. The structure
    is then re-optimized with density-functional tight binding (DFTB) supplemented with many-body dispersion (MBD)
    interactions. The lowest energy structure is then considered as the final equilibrium conformer. Additionally, non
    -equilibrium conformations are generated by displacing the equilibrium geometry along a linear combination of
    normal mode coordinates computed at the DFTB3-MBD level within the harmonic approximation. The dataset has
    energy values for each geometry computed at PBE0-MBD and DFTB3-MBD method.

    Usage:
    ```python
    from openqdc.datasets import QM7X
    dataset = QM7X()
    ```

    References:
        https://arxiv.org/abs/2006.15139\n
        https://zenodo.org/records/4288677
    """

    __name__ = "qm7x"

    __energy_methods__ = [PotentialMethod.PBE0_DEF2_TZVP, PotentialMethod.DFT3B]  # "pbe0/def2-tzvp", "dft3b"]

    energy_target_names = ["ePBE0+MBD", "eDFTB+MBD"]

    __force_mask__ = [True, False]

    force_target_names = ["pbe0FOR"]

    __energy_unit__ = "ev"
    __distance_unit__ = "ang"
    __forces_unit__ = "ev/ang"
    __links__ = {f"{i}000.xz": f"https://zenodo.org/record/4288677/files/{i}000.xz" for i in range(1, 9)}

    def read_raw_entries(self):
        samples = []
        for i in range(1, 9):
            raw_path = p_join(self.root, f"{i}000")
            data = load_hdf5_file(raw_path)
            samples += [
                read_mol(data[k], k, self.energy_target_names, self.force_target_names) for k in tqdm(data.keys())
            ]

        return samples

QM7X_V2

Bases: QM7X

QM7X_V2 is an extension of the QM7X dataset containing PM6 labels for each of the 4.2M geometries.

Usage:

from openqdc.datasets import QM7X_V2
dataset = QM7X_V2()

Source code in openqdc/datasets/potential/qm7x.py
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
class QM7X_V2(QM7X):
    """
    QM7X_V2 is an extension of the QM7X dataset containing PM6 labels for each of the 4.2M geometries.

    Usage:
    ```python
    from openqdc.datasets import QM7X_V2
    dataset = QM7X_V2()
    ```
    """

    __name__ = "qm7x_v2"
    __energy_methods__ = QM7X.__energy_methods__ + [PotentialMethod.PM6]
    __force_mask__ = QM7X.__force_mask__ + [False]
    energy_target_names = QM7X.energy_target_names + ["PM6"]
    force_target_names = QM7X.force_target_names