Bases: BaseDataset
      The solvated protein fragments dataset probes many-body intermolecular interactions between "protein fragments"
and water molecules. Geometries are first optimized with the semi-empirical method PM7 and then MD simulations are
run at 1000K with a time-step of 0.1fs using Atomic Simulations Environment (ASE). Structures are saved every 10
steps, where energies, forces and dipole moments are calculated at revPBE-D3(BJ)/def2-TZVP level of theory.
Usage:
from openqdc.datasets import SolvatedPeptides
dataset = SolvatedPeptides()
 
  References
  https://doi.org/10.1021/acs.jctc.9b00181
https://zenodo.org/records/2605372
 
              
                Source code in openqdc/datasets/potential/solvated_peptides.py
                 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59  | class SolvatedPeptides(BaseDataset):
    """
    The solvated protein fragments dataset probes many-body intermolecular interactions between "protein fragments"
    and water molecules. Geometries are first optimized with the semi-empirical method PM7 and then MD simulations are
    run at 1000K with a time-step of 0.1fs using Atomic Simulations Environment (ASE). Structures are saved every 10
    steps, where energies, forces and dipole moments are calculated at revPBE-D3(BJ)/def2-TZVP level of theory.
    Usage:
    ```python
    from openqdc.datasets import SolvatedPeptides
    dataset = SolvatedPeptides()
    ```
    References:
        https://doi.org/10.1021/acs.jctc.9b00181\n
        https://zenodo.org/records/2605372
    """
    __name__ = "solvated_peptides"
    __energy_methods__ = [
        PotentialMethod.REVPBE_D3_BJ_DEF2_TZVP
        # "revpbe-d3(bj)/def2-tzvp",
    ]
    energy_target_names = [
        "revPBE-D3(BJ):def2-TZVP Atomization Energy",
    ]
    __force_mask__ = [True]
    force_target_names = [
        "revPBE-D3(BJ):def2-TZVP Gradient",
    ]
    # TO CHECK
    __energy_unit__ = "ev"
    __distance_unit__ = "ang"
    __forces_unit__ = "ev/ang"
    __links__ = {"solvated_peptides.hdf5.gz": "https://zenodo.org/record/3585804/files/213.hdf5.gz"}
    def __smiles_converter__(self, x):
        """util function to convert string to smiles: useful if the smiles is
        encoded in a different format than its display format
        """
        return "_".join(x.decode("ascii").split("_")[:-1])
    def read_raw_entries(self):
        raw_path = p_join(self.root, "solvated_peptides.h5.gz")
        samples = read_qc_archive_h5(raw_path, "solvated_peptides", self.energy_target_names, self.force_target_names)
        return samples
  | 
 
               
  
            __smiles_converter__(x)
    
      util function to convert string to smiles: useful if the smiles is
encoded in a different format than its display format
            
              Source code in openqdc/datasets/potential/solvated_peptides.py
               | def __smiles_converter__(self, x):
    """util function to convert string to smiles: useful if the smiles is
    encoded in a different format than its display format
    """
    return "_".join(x.decode("ascii").split("_")[:-1])
  |