canns.data.datasets

Universal data loading utilities for CANNs.

This module provides generic functions to download and load data from URLs, with specialized support for CANNs example datasets.

Attributes

Functions

compute_file_hash(filepath)

Compute SHA256 hash of a file.

detect_file_type(filepath)

Detect file type based on extension.

download_dataset(dataset_key[, force])

Download a specific dataset.

download_file_with_progress(url, filepath[, chunk_size])

Download a file with progress bar.

get_data_dir()

Get the data directory, creating it if necessary.

get_dataset_path(dataset_key[, auto_setup])

Get path to a dataset, downloading/setting up if necessary.

get_huggingface_upload_guide()

Get guide for uploading datasets to Hugging Face.

get_left_right_data_session(session_id[, ...])

Download and return files for a Left_Right_data_of session.

get_left_right_npz(session_id, filename[, ...])

Download and return a specific Left_Right_data_of NPZ file.

list_datasets()

List available datasets with descriptions.

load(url[, cache_dir, force_download, file_type])

Universal data loading function that downloads and reads data from URLs.

load_file(filepath[, file_type])

Load data from file based on file type.

quick_setup()

Quick setup function to get datasets ready.

Module Contents

canns.data.datasets.compute_file_hash(filepath)[source]

Compute SHA256 hash of a file.

canns.data.datasets.detect_file_type(filepath)[source]

Detect file type based on extension.

canns.data.datasets.download_dataset(dataset_key, force=False)[source]

Download a specific dataset.

Parameters:
  • dataset_key (str) – Key of the dataset to download (e.g., ‘grid_1’, ‘roi_data’).

  • force (bool) – Whether to force re-download if file already exists.

Returns:

Path to downloaded file if successful, None otherwise.

Return type:

Path or None

canns.data.datasets.download_file_with_progress(url, filepath, chunk_size=8192)[source]

Download a file with progress bar.

canns.data.datasets.get_data_dir()[source]

Get the data directory, creating it if necessary.

canns.data.datasets.get_dataset_path(dataset_key, auto_setup=True)[source]

Get path to a dataset, downloading/setting up if necessary.

Parameters:
  • dataset_key (str) – Key of the dataset.

  • auto_setup (bool) – Whether to automatically attempt setup if dataset not found.

Returns:

Path to dataset file if available, None otherwise.

Return type:

Path or None

canns.data.datasets.get_huggingface_upload_guide()[source]

Get guide for uploading datasets to Hugging Face.

Returns:

Upload guide text.

Return type:

str

canns.data.datasets.get_left_right_data_session(session_id, auto_download=True, force=False)[source]

Download and return files for a Left_Right_data_of session.

Parameters:
  • session_id (str) – Session folder name, e.g. “24365_2”.

  • auto_download (bool) – Whether to download missing files automatically.

  • force (bool) – Whether to force re-download of existing files.

Returns:

Mapping with keys: “manifest”, “full_file”, “module_files”.

Return type:

dict or None

canns.data.datasets.get_left_right_npz(session_id, filename, auto_download=True, force=False)[source]

Download and return a specific Left_Right_data_of NPZ file.

Parameters:
  • session_id (str) – Session folder name, e.g. “26034_3”.

  • filename (str) – File name inside the session folder, e.g. “26034_3_ASA_mec_gridModule02_n104_cm.npz”.

  • auto_download (bool) – Whether to download the file if missing.

  • force (bool) – Whether to force re-download of existing files.

Returns:

Path to the requested file if available, None otherwise.

Return type:

Path or None

canns.data.datasets.list_datasets()[source]

List available datasets with descriptions.

canns.data.datasets.load(url, cache_dir=None, force_download=False, file_type=None)[source]

Universal data loading function that downloads and reads data from URLs.

Parameters:
  • url (str) – URL to download data from.

  • cache_dir (str or Path, optional) – Directory to cache downloaded files. If None, uses temporary directory.

  • force_download (bool) – Force re-download even if file exists in cache.

  • file_type (str, optional) – Force specific file type (‘text’, ‘numpy’, ‘json’, ‘pickle’, ‘hdf5’). If None, auto-detect from file extension.

Returns:

Loaded data.

Return type:

Any

Examples

>>> # Load numpy data
>>> data = load('https://example.com/data.npz')
>>>
>>> # Load text data with custom cache
>>> data = load('https://example.com/data.txt', cache_dir='./cache')
>>>
>>> # Force specific file type
>>> data = load('https://example.com/data.bin', file_type='numpy')
canns.data.datasets.load_file(filepath, file_type=None)[source]

Load data from file based on file type.

Parameters:
  • filepath (Path) – Path to the data file.

  • file_type (str, optional) – Force specific file type. If None, auto-detect from extension.

Returns:

Loaded data.

Return type:

Any

canns.data.datasets.quick_setup()[source]

Quick setup function to get datasets ready.

Returns:

True if successful, False otherwise.

Return type:

bool

canns.data.datasets.BASE_URL = 'https://huggingface.co/datasets/canns-team/data-analysis-datasets/resolve/main/'[source]
canns.data.datasets.DATASETS[source]
canns.data.datasets.DEFAULT_DATA_DIR[source]
canns.data.datasets.HAS_DOWNLOAD_DEPS = True[source]
canns.data.datasets.HAS_NUMPY = True[source]
canns.data.datasets.HUGGINGFACE_REPO = 'canns-team/data-analysis-datasets'[source]
canns.data.datasets.LEFT_RIGHT_DATASET_DIR = 'Left_Right_data_of'[source]