SCAN Waterclusters

`SCANWaterClusters` ¶

Bases: BaseDataset

The SCAN Water Clusters dataset contains conformations of neutral water clusters containing up to 20 monomers, charged water clusters, and alkali- and halide-water clusters. This dataset consists of our data sets of water clusters: the benchmark energy and geometry database (BEGDB) neutral water cluster subset; the WATER2723 set of 14 neutral, 5 protonated, 7 deprotonated, and one auto-ionized water cluster; and two sets of ion-water clusters M...(H2O)n, where M = Li+, Na+, K+, F−, Cl−, or Br−. Water clusters were obtained from 10 nanosecond gas-phase molecular dynamics simulations using AMBER 9 and optimized to obtain lowest energy isomers were determined using MP2/aug-cc-pVDZ//MP2/6-31G* Gibbs free energies.

Chemical Species

[H, O, Li, Na, K, F, Cl, Br]

Usage:

from openqdc.datasets import SCANWaterClusters
dataset = SCANWaterClusters()

References

https://chemrxiv.org/engage/chemrxiv/article-details/662aaff021291e5d1db7d8ec

https://github.com/esoteric-ephemera/water_cluster_density_errors

Source code in openqdc/datasets/potential/waterclusters.py

class SCANWaterClusters(BaseDataset):
    """
    The SCAN Water Clusters dataset contains conformations of
    neutral water clusters containing up to 20 monomers, charged water clusters,
    and alkali- and halide-water clusters. This dataset consists of our data sets of water clusters:
    the benchmark energy and geometry database (BEGDB) neutral water cluster subset; the WATER2723 set of 14
    neutral, 5 protonated, 7 deprotonated, and one auto-ionized water cluster; and two sets of
    ion-water clusters M...(H2O)n, where M = Li+, Na+, K+, F−, Cl−, or Br−.
    Water clusters were obtained from  10 nanosecond gas-phase molecular dynamics
    simulations using AMBER 9 and optimized to obtain
    lowest energy isomers were determined using MP2/aug-cc-pVDZ//MP2/6-31G* Gibbs free energies.


    Chemical Species:
        [H, O, Li, Na, K, F, Cl, Br]

    Usage:
    ```python
    from openqdc.datasets import SCANWaterClusters
    dataset = SCANWaterClusters()
    ```

    References:
        https://chemrxiv.org/engage/chemrxiv/article-details/662aaff021291e5d1db7d8ec\n
        https://github.com/esoteric-ephemera/water_cluster_density_errors
    """

    __name__ = "scanwaterclusters"

    __energy_unit__ = "hartree"
    __distance_unit__ = "ang"
    __forces_unit__ = "hartree/ang"
    energy_target_names = [
        "HF",
        "HF-r2SCAN-DC4",
        "SCAN",
        "SCAN@HF",
        "SCAN@r2SCAN50",
        "r2SCAN",
        "r2SCAN@HF",
        "r2SCAN@r2SCAN50",
        "r2SCAN50",
        "r2SCAN100",
        "r2SCAN10",
        "r2SCAN20",
        "r2SCAN25",
        "r2SCAN30",
        "r2SCAN40",
        "r2SCAN60",
        "r2SCAN70",
        "r2SCAN80",
        "r2SCAN90",
    ]
    __energy_methods__ = [PotentialMethod.NONE for _ in range(len(energy_target_names))]
    force_target_names = []
    # 27            # 9 level
    subsets = ["BEGDB_H2O", "WATER27", "H2O_alkali_clusters", "H2O_halide_clusters"]
    __links__ = {
        "geometries.json.gz": "https://github.com/esoteric-ephemera/water_cluster_density_errors/blob/main/data_files/geometries.json.gz?raw=True",  # noqa
        "total_energies.json.gz": "https://github.com/esoteric-ephemera/water_cluster_density_errors/blob/main/data_files/total_energies.json.gz?raw=True",  # noqa
    }

    def read_raw_entries(self):
        entries = []  # noqa
        for i, subset in enumerate(self.subsets):
            geometries = read_geometries(p_join(self.root, "geometries.json.gz"), subset)
            energies = read_energies(p_join(self.root, "total_energies.json.gz"), subset)
            datum = {}
            for k in energies:
                _ = energies[k].pop("metadata")
                datum[k] = energies[k]["total_energies"]
            entries.extend(format_geometry_and_entries(geometries, datum, subset))
        return entries

SCAN Waterclusters

SCANWaterClusters ¶

`SCANWaterClusters` ¶