Skip to content

SCAN Waterclusters

SCANWaterClusters

Bases: BaseDataset

The SCAN Water Clusters dataset contains conformations of neutral water clusters containing up to 20 monomers, charged water clusters, and alkali- and halide-water clusters. This dataset consists of our data sets of water clusters: the benchmark energy and geometry database (BEGDB) neutral water cluster subset; the WATER2723 set of 14 neutral, 5 protonated, 7 deprotonated, and one auto-ionized water cluster; and two sets of ion-water clusters M...(H2O)n, where M = Li+, Na+, K+, F−, Cl−, or Br−. Water clusters were obtained from 10 nanosecond gas-phase molecular dynamics simulations using AMBER 9 and optimized to obtain lowest energy isomers were determined using MP2/aug-cc-pVDZ//MP2/6-31G* Gibbs free energies.

Chemical Species

[H, O, Li, Na, K, F, Cl, Br]

Usage:

from openqdc.datasets import SCANWaterClusters
dataset = SCANWaterClusters()

References

https://chemrxiv.org/engage/chemrxiv/article-details/662aaff021291e5d1db7d8ec

https://github.com/esoteric-ephemera/water_cluster_density_errors

Source code in openqdc/datasets/potential/waterclusters.py
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
class SCANWaterClusters(BaseDataset):
    """
    The SCAN Water Clusters dataset contains conformations of
    neutral water clusters containing up to 20 monomers, charged water clusters,
    and alkali- and halide-water clusters. This dataset consists of our data sets of water clusters:
    the benchmark energy and geometry database (BEGDB) neutral water cluster subset; the WATER2723 set of 14
    neutral, 5 protonated, 7 deprotonated, and one auto-ionized water cluster; and two sets of
    ion-water clusters M...(H2O)n, where M = Li+, Na+, K+, F−, Cl−, or Br−.
    Water clusters were obtained from  10 nanosecond gas-phase molecular dynamics
    simulations using AMBER 9 and optimized to obtain
    lowest energy isomers were determined using MP2/aug-cc-pVDZ//MP2/6-31G* Gibbs free energies.


    Chemical Species:
        [H, O, Li, Na, K, F, Cl, Br]

    Usage:
    ```python
    from openqdc.datasets import SCANWaterClusters
    dataset = SCANWaterClusters()
    ```

    References:
        https://chemrxiv.org/engage/chemrxiv/article-details/662aaff021291e5d1db7d8ec\n
        https://github.com/esoteric-ephemera/water_cluster_density_errors
    """

    __name__ = "scanwaterclusters"

    __energy_unit__ = "hartree"
    __distance_unit__ = "ang"
    __forces_unit__ = "hartree/ang"
    energy_target_names = [
        "HF",
        "HF-r2SCAN-DC4",
        "SCAN",
        "SCAN@HF",
        "SCAN@r2SCAN50",
        "r2SCAN",
        "r2SCAN@HF",
        "r2SCAN@r2SCAN50",
        "r2SCAN50",
        "r2SCAN100",
        "r2SCAN10",
        "r2SCAN20",
        "r2SCAN25",
        "r2SCAN30",
        "r2SCAN40",
        "r2SCAN60",
        "r2SCAN70",
        "r2SCAN80",
        "r2SCAN90",
    ]
    __energy_methods__ = [PotentialMethod.NONE for _ in range(len(energy_target_names))]
    force_target_names = []
    # 27            # 9 level
    subsets = ["BEGDB_H2O", "WATER27", "H2O_alkali_clusters", "H2O_halide_clusters"]
    __links__ = {
        "geometries.json.gz": "https://github.com/esoteric-ephemera/water_cluster_density_errors/blob/main/data_files/geometries.json.gz?raw=True",  # noqa
        "total_energies.json.gz": "https://github.com/esoteric-ephemera/water_cluster_density_errors/blob/main/data_files/total_energies.json.gz?raw=True",  # noqa
    }

    def read_raw_entries(self):
        entries = []  # noqa
        for i, subset in enumerate(self.subsets):
            geometries = read_geometries(p_join(self.root, "geometries.json.gz"), subset)
            energies = read_energies(p_join(self.root, "total_energies.json.gz"), subset)
            datum = {}
            for k in energies:
                _ = energies[k].pop("metadata")
                datum[k] = energies[k]["total_energies"]
            entries.extend(format_geometry_and_entries(geometries, datum, subset))
        return entries