Skip to content

Solvated Peptides

SolvatedPeptides

Bases: BaseDataset

The solvated protein fragments dataset probes many-body intermolecular interactions between "protein fragments" and water molecules. Geometries are first optimized with the semi-empirical method PM7 and then MD simulations are run at 1000K with a time-step of 0.1fs using Atomic Simulations Environment (ASE). Structures are saved every 10 steps, where energies, forces and dipole moments are calculated at revPBE-D3(BJ)/def2-TZVP level of theory.

Usage:

from openqdc.datasets import SolvatedPeptides
dataset = SolvatedPeptides()

References

https://doi.org/10.1021/acs.jctc.9b00181

https://zenodo.org/records/2605372

Source code in openqdc/datasets/potential/solvated_peptides.py
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
class SolvatedPeptides(BaseDataset):
    """
    The solvated protein fragments dataset probes many-body intermolecular interactions between "protein fragments"
    and water molecules. Geometries are first optimized with the semi-empirical method PM7 and then MD simulations are
    run at 1000K with a time-step of 0.1fs using Atomic Simulations Environment (ASE). Structures are saved every 10
    steps, where energies, forces and dipole moments are calculated at revPBE-D3(BJ)/def2-TZVP level of theory.

    Usage:
    ```python
    from openqdc.datasets import SolvatedPeptides
    dataset = SolvatedPeptides()
    ```

    References:
        https://doi.org/10.1021/acs.jctc.9b00181\n
        https://zenodo.org/records/2605372
    """

    __name__ = "solvated_peptides"

    __energy_methods__ = [
        PotentialMethod.REVPBE_D3_BJ_DEF2_TZVP
        # "revpbe-d3(bj)/def2-tzvp",
    ]

    energy_target_names = [
        "revPBE-D3(BJ):def2-TZVP Atomization Energy",
    ]

    __force_mask__ = [True]

    force_target_names = [
        "revPBE-D3(BJ):def2-TZVP Gradient",
    ]

    # TO CHECK
    __energy_unit__ = "ev"
    __distance_unit__ = "ang"
    __forces_unit__ = "ev/ang"
    __links__ = {"solvated_peptides.hdf5.gz": "https://zenodo.org/record/3585804/files/213.hdf5.gz"}

    def __smiles_converter__(self, x):
        """util function to convert string to smiles: useful if the smiles is
        encoded in a different format than its display format
        """
        return "_".join(x.decode("ascii").split("_")[:-1])

    def read_raw_entries(self):
        raw_path = p_join(self.root, "solvated_peptides.h5.gz")
        samples = read_qc_archive_h5(raw_path, "solvated_peptides", self.energy_target_names, self.force_target_names)

        return samples

__smiles_converter__(x)

util function to convert string to smiles: useful if the smiles is encoded in a different format than its display format

Source code in openqdc/datasets/potential/solvated_peptides.py
49
50
51
52
53
def __smiles_converter__(self, x):
    """util function to convert string to smiles: useful if the smiles is
    encoded in a different format than its display format
    """
    return "_".join(x.decode("ascii").split("_")[:-1])