Skip to content

CLI for dataset downloading and uploading

You can quickly download, fetch, preprocess and upload openQDC datasets using the command line interface (CLI).

Datasets

Print a formatted table of the available openQDC datasets and some informations.

Usage:

openqdc datasets [OPTIONS]

Options:

--help          Show this message and exit.

Cache

Get the current local cache path of openQDC

Usage:

openqdc cache [OPTIONS]

Options:

--help          Show this message and exit.

Download

Download preprocessed ml-ready datasets from the main openQDC hub.

Usage:

openqdc download DATASETS... [OPTIONS]

Options:

--help          Show this message and exit.
--overwrite     Whether to force the re-download of the datasets and overwrite the current cached dataset. [default: no-overwrite]
--cache-dir     Path to the cache. If not provided, the default cache directory (.cache/openqdc/) will be used. [default: None]
--as-zarr       Whether to use a zarr format for the datasets instead of memmap. [default: no-as-zarr]
--gs            Whether source to use for downloading. If True, Google Storage will be used.Otherwise, AWS S3 will be used [default: no-gs]

Example:

openqdc download Spice

Fetch

Download the raw datasets files from the main openQDC hub

Note:

Special case: if the dataset is "all", "potential", "interaction".

Usage:

openqdc fetch DATASETS... [OPTIONS]

Options:

--help          Show this message and exit.
--overwrite     Whether to overwrite or force the re-download of the raw files. [default: no-overwrite]
--cache-dir     Path to the cache. If not provided, the default cache directory (.cache/openqdc/) will be used. [default: None]

Example:

openqdc fetch Spice

Preprocess

Preprocess a raw dataset (previously fetched) into a openqdc dataset and optionally push it to remote.

Usage:

openqdc preprocess DATASETS... [OPTIONS]

Options:

--help         Show this message and exit.
--overwrite    Whether to overwrite the current cached datasets. [default: overwrite]
--upload       Whether to attempt the upload to the remote storage. Must have write permissions. [default: no-upload]
--as-zarr      Whether to preprocess as a zarr format or a memmap format. [default: no-as-zarr]

Example:

openqdc preprocess Spice QMugs

Upload

Upload a preprocessed dataset to the remote storage

Usage:

openqdc upload DATASETS... [OPTIONS]

Options:

--help          Show this message and exit.
--overwrite     Whether to overwrite the remote files if they are present. [default: overwrite]
--as-zarr       Whether to upload the zarr files if available. [default: no-as-zarr]

Example:

openqdc upload Spice --overwrite

Convert

Convert a preprocessed dataset from a memmap dataset to a zarr dataset.

Usage:

openqdc convert DATASETS... [OPTIONS]

Options:

--help          Show this message and exit.
--overwrite     Whether to overwrite the current zarr cached datasets. [default: no-overwrite]
--download      Whether to force the re-download of the memmap datasets. [default: no-download]