Format loading
GeneralStructure
¶
Bases: ABC
Abstract Factory class for datasets type in the openQDC package.
Source code in openqdc/datasets/structure.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
|
load_fn: Callable
abstractmethod
property
¶
Function to use for loading the data. Must be implemented by the child class.
Returns:
Type | Description |
---|---|
Callable
|
the function to use for loading the data |
add_extension(filename)
¶
Add the correct extension to a filename
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filename |
str
|
the filename to add the extension to |
required |
Returns:
Type | Description |
---|---|
str
|
the filename with the extension |
Source code in openqdc/datasets/structure.py
37 38 39 40 41 42 43 44 45 46 47 |
|
join_and_ext(path, filename)
¶
Join a path and a filename and add the correct extension.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
Union[str, PathLike]
|
the path to join |
required |
filename |
str
|
the filename to join |
required |
Returns:
Type | Description |
---|---|
Union[str, PathLike]
|
the joined path with the correct extension |
Source code in openqdc/datasets/structure.py
93 94 95 96 97 98 99 100 101 102 103 104 |
|
load_data(preprocess_path, data_keys, data_types, data_shapes, extra_data_keys, overwrite)
¶
Main method to load the data from a filetype structure like memmap or zarr.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
preprocess_path |
Union[str, PathLike]
|
path to the preprocessed data file |
required |
data_keys |
List[str]
|
list of keys to load from the data file |
required |
data_types |
Dict[str, dtype]
|
dictionary of data types for each key |
required |
data_shapes |
Dict[str, Tuple[int, int]]
|
dictionary of shapes for each key |
required |
extra_data_keys |
List[str]
|
list of keys to load from the extra data file |
required |
overwrite |
bool
|
whether to overwrite the local cache |
required |
Source code in openqdc/datasets/structure.py
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
|
load_extra_files(data, preprocess_path, data_keys, pkl_data_keys, overwrite)
abstractmethod
¶
Load extra files required to define other types of data. Must be implemented by the child class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Dict[str, ndarray]
|
dictionary of data to load |
required |
preprocess_path |
Union[str, PathLike]
|
path to the preprocessed data file |
required |
data_keys |
List[str]
|
list of keys to load from the data file |
required |
pkl_data_keys |
List[str]
|
list of keys to load from the extra files |
required |
overwrite |
bool
|
whether to overwrite the local cache |
required |
Source code in openqdc/datasets/structure.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
|
save_preprocess(preprocess_path, data_keys, data_dict, extra_data_keys, extra_data_types)
abstractmethod
¶
Save the preprocessed data to the cache directory and optionally upload it to the remote storage. Must be implemented by the child class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
preprocess_path |
Union[str, PathLike]
|
path to the preprocessed data file |
required |
data_keys |
List[str]
|
list of keys to load from the data file |
required |
data_dict |
Dict[str, ndarray]
|
dictionary of data to save |
required |
extra_data_keys |
List[str]
|
list of keys to load from the extra data file |
required |
extra_data_types |
Dict[str, type]
|
dictionary of data types for each key |
required |
Source code in openqdc/datasets/structure.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
|
unpack(data)
¶
Unpack the data from the loaded file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
any
|
the data to unpack |
required |
Returns:
Type | Description |
---|---|
any
|
the unpacked data |
Source code in openqdc/datasets/structure.py
137 138 139 140 141 142 143 144 145 146 147 |
|
MemMapDataset
¶
Bases: GeneralStructure
Dataset structure for memory-mapped numpy arrays and props.pkl files.
Source code in openqdc/datasets/structure.py
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 |
|
ZarrDataset
¶
Bases: GeneralStructure
Dataset structure for zarr files.
Source code in openqdc/datasets/structure.py
205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 |
|