Molecule3D
Molecule3D
¶
Bases: BaseDataset
Molecule3D dataset consists of 3,899,647 molecules with equilibrium geometries and energies calculated at the B3LYP/6-31G* level of theory. The molecules are extracted from the PubChem database and cleaned by removing molecules with invalid molecule files, with SMILES conversion error, RDKIT warnings, sanitization problems, or with damaged log files.
Usage:
from openqdc.datasets import Molecule3D
dataset = Molecule3D()
Source code in openqdc/datasets/potential/molecule3d.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
|
read_mol(mol, energy)
¶
Read molecule (Chem.rdchem.Mol) and energy (float) and return dict with conformers and energies
Parameters¶
mol: Chem.rdchem.Mol RDKit molecule energy: float Energy of the molecule
Returns¶
res: dict Dictionary containing the following keys: - name: np.ndarray of shape (N,) containing the smiles of the molecule - atomic_inputs: flatten np.ndarray of shape (M, 5) containing the atomic numbers, charges and positions - energies: np.ndarray of shape (1,) containing the energy of the conformer - n_atoms: np.ndarray of shape (1) containing the number of atoms in the conformer - subset: np.ndarray of shape (1) containing "molecule3d"
Source code in openqdc/datasets/potential/molecule3d.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
|