Skip to content

RevMD17

RevMD17

Bases: BaseDataset

Revised MD (RevMD17) improves upon the MD17 dataset by removing all the numerical noise present in the original dataset. The data is generated from an ab-initio molecular dynamics (AIMD) simulation where forces and energies are computed at the PBE/def2-SVP level of theory using very tigh SCF convergence and very dense DFT integration grid. The dataset contains the following molecules: Benzene: 627000 samples

Uracil: 133000 samples

Naptalene: 326000 samples

Aspirin: 211000 samples

Salicylic Acid: 320000 samples

Malonaldehyde: 993000 samples

Ethanol: 555000 samples

Toluene: 100000 samples

Usage:

from openqdc.datasets import RevMD17
dataset = RevMD17()

References

https://arxiv.org/abs/2007.09593

Source code in openqdc/datasets/potential/revmd17.py
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
class RevMD17(BaseDataset):
    """
    Revised MD (RevMD17) improves upon the MD17 dataset by removing all the numerical noise present in the original
    dataset. The data is generated from an ab-initio molecular dynamics (AIMD) simulation where forces and energies
    are computed at the PBE/def2-SVP level of theory using very tigh SCF convergence and very dense DFT integration
    grid. The dataset contains the following molecules:
        Benzene: 627000 samples\n
        Uracil: 133000 samples\n
        Naptalene: 326000 samples\n
        Aspirin: 211000 samples\n
        Salicylic Acid: 320000 samples\n
        Malonaldehyde: 993000 samples\n
        Ethanol: 555000 samples\n
        Toluene: 100000 samples\n

    Usage:
    ```python
    from openqdc.datasets import RevMD17
    dataset = RevMD17()
    ```

    References:
        https://arxiv.org/abs/2007.09593
    """

    __name__ = "revmd17"

    __energy_methods__ = [
        PotentialMethod.PBE_DEF2_TZVP
        # "pbe/def2-tzvp",
    ]
    __force_mask__ = [True]

    energy_target_names = [
        "PBE-TS Energy",
    ]

    __force_methods__ = [
        "pbe/def2-tzvp",
    ]

    force_target_names = [
        "PBE-TS Gradient",
    ]
    __links__ = {"revmd17.zip": "https://figshare.com/ndownloader/articles/12672038/versions/3"}

    __energy_unit__ = "kcal/mol"
    __distance_unit__ = "ang"
    __forces_unit__ = "kcal/mol/ang"

    def read_raw_entries(self):
        entries_list = []
        decompress_tar_gz(p_join(self.root, "rmd17.tar.bz2"))
        for trajectory in trajectories:
            entries_list.append(read_npz_entry(trajectory, self.root))
        return entries_list