canns.data.datasets¶

Universal data loading utilities for CANNs.

This module provides generic functions to download and load data from URLs, with specialized support for CANNs example datasets.

Attributes¶

`BASE_URL`
`DATASETS`
`DEFAULT_DATA_DIR`
`HAS_DOWNLOAD_DEPS`
`HAS_NUMPY`
`HUGGINGFACE_REPO`
`LEFT_RIGHT_DATASET_DIR`

Functions¶

`compute_file_hash`(filepath)	Compute SHA256 hash of a file.
`detect_file_type`(filepath)	Detect file type based on extension.
`download_dataset`(dataset_key[, force])	Download a specific dataset.
`download_file_with_progress`(url, filepath[, chunk_size])	Download a file with progress bar.
`get_data_dir`()	Get the data directory, creating it if necessary.
`get_dataset_path`(dataset_key[, auto_setup])	Get path to a dataset, downloading/setting up if necessary.
`get_huggingface_upload_guide`()	Get guide for uploading datasets to Hugging Face.
`get_left_right_data_session`(session_id[, ...])	Download and return files for a Left_Right_data_of session.
`get_left_right_npz`(session_id, filename[, ...])	Download and return a specific Left_Right_data_of NPZ file.
`list_datasets`()	List available datasets with descriptions.
`load`(url[, cache_dir, force_download, file_type])	Universal data loading function that downloads and reads data from URLs.
`load_file`(filepath[, file_type])	Load data from file based on file type.
`quick_setup`()	Quick setup function to get datasets ready.

Module Contents¶

canns.data.datasets.compute_file_hash(filepath)[source]¶

Compute SHA256 hash of a file.

canns.data.datasets.detect_file_type(filepath)[source]¶

Detect file type based on extension.

canns.data.datasets.download_dataset(dataset_key, force=False)[source]¶

Download a specific dataset.

Parameters:

dataset_key (str) – Key of the dataset to download (e.g., ‘grid_1’, ‘roi_data’).
force (bool) – Whether to force re-download if file already exists.

Returns:

Path to downloaded file if successful, None otherwise.

Return type:

Path or None

canns.data.datasets.download_file_with_progress(url, filepath, chunk_size=8192)[source]¶

Download a file with progress bar.

canns.data.datasets.get_data_dir()[source]¶

Get the data directory, creating it if necessary.

canns.data.datasets.get_dataset_path(dataset_key, auto_setup=True)[source]¶

Get path to a dataset, downloading/setting up if necessary.

Parameters:

dataset_key (str) – Key of the dataset.
auto_setup (bool) – Whether to automatically attempt setup if dataset not found.

Returns:

Path to dataset file if available, None otherwise.

Return type:

Path or None

canns.data.datasets.get_huggingface_upload_guide()[source]¶

Get guide for uploading datasets to Hugging Face.

Returns:: Upload guide text.
Return type:: str

canns.data.datasets.get_left_right_data_session(session_id, auto_download=True, force=False)[source]¶

Download and return files for a Left_Right_data_of session.

Parameters:

session_id (str) – Session folder name, e.g. “24365_2”.
auto_download (bool) – Whether to download missing files automatically.
force (bool) – Whether to force re-download of existing files.

Returns:

Mapping with keys: “manifest”, “full_file”, “module_files”.

Return type:

dict or None

canns.data.datasets.get_left_right_npz(session_id, filename, auto_download=True, force=False)[source]¶

Download and return a specific Left_Right_data_of NPZ file.

Parameters:

session_id (str) – Session folder name, e.g. “26034_3”.
filename (str) – File name inside the session folder, e.g. “26034_3_ASA_mec_gridModule02_n104_cm.npz”.
auto_download (bool) – Whether to download the file if missing.
force (bool) – Whether to force re-download of existing files.

Returns:

Path to the requested file if available, None otherwise.

Return type:

Path or None

canns.data.datasets.list_datasets()[source]¶

List available datasets with descriptions.

canns.data.datasets.load(url, cache_dir=None, force_download=False, file_type=None)[source]¶

Universal data loading function that downloads and reads data from URLs.

Parameters:

url (str) – URL to download data from.
cache_dir (str or Path, optional) – Directory to cache downloaded files. If None, uses temporary directory.
force_download (bool) – Force re-download even if file exists in cache.
file_type (str, optional) – Force specific file type (‘text’, ‘numpy’, ‘json’, ‘pickle’, ‘hdf5’). If None, auto-detect from file extension.

Returns:

Loaded data.

Return type:

Any

Examples

>>> # Load numpy data
>>> data = load('https://example.com/data.npz')
>>>
>>> # Load text data with custom cache
>>> data = load('https://example.com/data.txt', cache_dir='./cache')
>>>
>>> # Force specific file type
>>> data = load('https://example.com/data.bin', file_type='numpy')

canns.data.datasets.load_file(filepath, file_type=None)[source]¶

Load data from file based on file type.

Parameters:

filepath (Path) – Path to the data file.
file_type (str, optional) – Force specific file type. If None, auto-detect from extension.

Returns:

Loaded data.

Return type:

Any

canns.data.datasets.quick_setup()[source]¶

Quick setup function to get datasets ready.

Returns:: True if successful, False otherwise.
Return type:: bool

canns.data.datasets.BASE_URL = 'https://huggingface.co/datasets/canns-team/data-analysis-datasets/resolve/main/'[source]¶

canns.data.datasets.DATASETS[source]¶

canns.data.datasets.DEFAULT_DATA_DIR[source]¶

canns.data.datasets.HAS_DOWNLOAD_DEPS = True[source]¶

canns.data.datasets.HAS_NUMPY = True[source]¶

canns.data.datasets.HUGGINGFACE_REPO = 'canns-team/data-analysis-datasets'[source]¶

canns.data.datasets.LEFT_RIGHT_DATASET_DIR = 'Left_Right_data_of'[source]¶