base#

Base Dataset class and conversion utilities for IRDL.

This module provides the BaseDataset abstract base class which serves as the common interface for all Dataset implementations. Each Dataset subclass must implement:

  • validate_params()

  • download()

  • ingest()

  • _source_filename()

The BaseDataset class handles:

  • Common parameter extraction

  • Path construction

  • Cache checking

  • Output format conversion from SOFA

class irdl.base.BaseDataset#

Bases: ABC

Abstract base class providing common interface for all Dataset implementations.

Attributes:
namestr

Unique identifier for the Dataset.

doistr

Digital Object Identifier for the Dataset.

Methods

_validate_params(**dataset_kwargs)

Validate dataset-specific parameters (including output_format).

_source_filename(**dataset_kwargs) -> str

Construct the raw input filename with extension.

_download(**dataset_kwargs) -> Path

Download and return Path to raw file.

_process(provider_artifact: Path, ingest_path: Path, **_dataset_kwargs) -> Path:

Post-process downloaded file if needed

_ingest(ingest_path: Path) -> sofar.Sofa

Convert processed or raw file to sofar.Sofa object.

get() @classmethod

Public entry point. Uses explicit type signature for CLI auto-generation.

abstractmethod _download(provider_dir: Path, **dataset_kwargs) Path#

Concrete download logic. Override in subclass.

_export_raw(provider_artifact: Path, export_dir: Path) Path#

Export raw provider artifact to export directory.

For file artifacts, copies the file with its actual name. For directory artifacts, copies all contents to the output base directory. Raises ValueError if provider_artifact is neither a file nor a directory.

Parameters:
provider_artifactPath

Path to the downloaded artifact (file or directory).

export_dirPath

Target export directory.

Returns:
Path

Path to the exported file or directory.

_get(cache_dir: Path | str | None, export_dir: Path | str | None, output_format: str, **dataset_kwargs) dict | Path | None#

Internal implementation of Dataset retrieval.

Parameters:
cache_dirpathlib.Path or str or None

Cache directory for downloads.

export_dirpathlib.Path or str or None

Directory for final output. Default is None (stays in cache_dir).

output_formatstr

Output format: ‘pyfar’, ‘numpy’, ‘hdf5’, ‘sofa’, or ‘raw’.

**dataset_kwargsdict

Dataset-specific parameters.

Returns:
dict or pathlib.Path

For ‘pyfar’ / ‘numpy’: a dict of in-memory objects. For ‘sofa’ / ‘hdf5’ / ‘raw’: a pathlib.Path to the file on disk.

_get_doc_prefix = "Download {name} dataset.\n\nParameters\n----------\ncache_dir : str\n    Cache directory for downloads. Defaults is the OS user cache directory.\n    This default can be overridden by setting `IRDL_CACHE_DIR` environment variable.\nexport_dir : str, optional\n    Directory for final output. If specified, the data will be exported to <export_dir/{name}/>. Else, it remains in\n    <cache_dir/output/>.\noutput_format : str\n    Output format: 'pyfar', 'numpy', 'hdf5', 'sofa', or 'raw'.\n"#
abstractmethod _ingest(ingest_path: Path) Sofa#

Convert processed or raw file to sofar.Sofa object.

Override in subclass.

Parameters:
ingest_pathpathlib.Path

Path to the ingest-ready file in the ingest/ subdirectory.

Returns:
sofasofar.Sofa

SOFA object representing the Dataset data.

_output_path(output_dir: Path, source_filename: str, output_format: str) Path | None#

Return the canonical Path where a file-based output would be written.

Returns None for formats (‘pyfar’, ‘numpy’, ‘raw’). Constructs the Path based on filename, directory target and output format.

Parameters:
output_dirPath

The output directory. Either cache_dir/output, export_dir, or export_dir/raw.

source_filenamestr

The name of the ingestible file. Constructed with _source_filename

output_formatstr

One of ‘pyfar’, ‘numpy’, ‘hdf5’, ‘sofa’, ‘raw’.

Returns:
Path or None

Canonical output path, or None for in-memory formats.

_process(provider_artifact: Path, ingest_path: Path, **_dataset_kwargs) Path#

Post-process downloaded file if needed.

Override in subclass to extract, merge, or otherwise transform the downloaded data. Write the processed, ingest-ready file to ingest_path and return it.

The default implementation promotes the provider file to the ingest stage. If the provider path is a file and differs from the ingest path, it creates a hard link (or falls back to a copy) so the ingest file exists.

Parameters:
provider artifactPath

Path to the freshly downloaded file (or download directory).

ingest_pathpathlib.Path

Path to the ingestible file in the ingest directory.

**dataset_kwargsdict

Dataset-specific parameters (unused by the default implementation).

Returns:
ingest_pathPath

The processed, ingest-ready file at ingest_path.

abstractmethod _source_filename(**dataset_kwargs) str#

Construct the ingest-ready filename with extension for the dataset.

Override in subclass.

This name is canonical: _get treats the existence of that path as proof that download and processing are already done (if so it skips both and ingests the file directly). The name therefore must match the file that actually ends up on disk after _download + _process: i.e. the processed file (merged/extracted), which is not necessarily the raw download.

Parameters:
**dataset_kwargsdict

Dataset-specific parameters used to construct the filename.

Returns:
str

The ingest-ready filename including extension (e.g., “A1.h5”, “FABIAN_HRIR_measured_HATO_0.sofa”).

_to_hdf5(sofa: Sofa, ingest_path: Path, output_path: Path) Path#

Convert sofar.Sofa to HDF5 file and return Path.

Parameters:
sofasofar.Sofa

SOFA object to convert.

ingest_pathpathlib.Path

Path to the ingestible file.

output_pathpathlib.Path

Path where the .h5 file should be written.

Returns:
pathlib.Path

Path to the written HDF5 file.

_to_numpy(sofa: Sofa) dict#

Convert sofar.Sofa to dict of numpy arrays.

Parameters:
sofasofar.Sofa

SOFA object to convert.

Returns:
dict

Dictionary with keys: - “impulse_response” : numpy.ndarray - “source_coordinates” : numpy.ndarray - “receiver_coordinates” : numpy.ndarray - “sampling_rate” : float

_to_output(sofa: Sofa, output_format: str, ingest_path: Path, output_path: Path | None) dict | Path#

Convert sofar.Sofa to the requested output format.

Parameters:
sofasofar.Sofa

SOFA object to convert.

output_formatstr

One of “pyfar”, “numpy”, “hdf5”, “sofa”.

ingest_pathpathlib.Path

Path to the ingestible file. We also pass to allow for file-based export mechanics that avoid loading into memory.

output_pathpathlib.Path or None

Path where file-based outputs should be written.

Returns:
dict or pathlib.Path

Output depends on output_format: - “pyfar” : dict of pyfar.Signal and pyfar.Coordinates objects - “numpy” : dict of numpy.ndarray arrays - “hdf5” : pathlib.Path to .h5 file - “sofa” : pathlib.Path to .sofa file

_to_pyfar(sofa: Sofa) dict#

Convert sofar.Sofa to dict of pyfar objects.

Parameters:
sofasofar.Sofa

SOFA object to convert.

Returns:
dict

Dictionary with keys: - “impulse_response” : pyfar.Signal - “source_coordinates” : pyfar.Coordinates - “receiver_coordinates” : pyfar.Coordinates

_to_sofa(sofa: Sofa, ingest_path: Path, output_path: Path) Path#

Write sofar.Sofa to file and return Path.

Parameters:
sofasofar.Sofa

SOFA object to write.

ingest_pathpathlib.Path

Path to the ingestible file.

output_pathpathlib.Path

Path where the .sofa file should be written.

Returns:
pathlib.Path

Path to the written SOFA file.

abstractmethod _validate_params(**dataset_kwargs) None#

Validate dataset-specific parameters.

Override in subclass. This method receives dataset-specific parameters plus output_format (so subclasses can forbid invalid output_format / dataset-parameter combinations).

Parameters:
**dataset_kwargsdict

Dataset-specific parameters to validate, including output_format.

Raises:
ValueError

If any parameter is invalid.

doi: str#
download(provider_dir: Path, **dataset_kwargs) Path#

Download raw files and return Path to the primary artifact.

This method wraps _download to enforce provider_dir existence for all subclasses.

Parameters:
provider_dirpathlib.Path

Target path where the file(s) should be downloaded to. For single-file providers, this may be the file path itself. For multi-file providers, this may be a directory where files are placed.

**dataset_kwargsdict

Dataset-specific parameters.

Returns:
provider_artifactpathlib.Path

Path to the downloaded artifact on disk (file or directory).

name: str#
process(provider_artifact: Path, ingest_path: Path, **dataset_kwargs) Path#

Post-process downloaded file if needed.

This method wraps _process to enforce ingest_dir existence for all subclasses.

Parameters:
provider artifactPath

Path to the freshly downloaded file (or download directory).

ingest_pathpathlib.Path

Path to the ingestible file in the ingest directory.

**dataset_kwargsdict

Dataset-specific parameters passed through to _process.

Returns:
ingest_pathPath

The processed, ingest-ready file at ingest_path.

class irdl.base.DatasetCategory(*values)#

Bases: StrEnum

Categories for grouping datasets.

ROOM_IMPULSE_RESPONSES = 'room_impulse_responses'#
static _generate_next_value_(name, start, count, last_values)#

Return the lower-cased version of the member name.

class irdl.base.SofaBaseDataset#

Bases: BaseDataset

Base class for datasets whose ingest-ready format is already SOFA.

The primary distinction is that output_format='sofa' can either directly copy or link the ingest-ready file, avoiding having to write the sofa file in memory.

_ingest(ingest_path: Path) Sofa#

Load SOFA file into sofar.Sofa object.

Parameters:
ingest_pathpathlib.Path

Path to the SOFA file in the ingest directory.

Returns:
sofar.Sofa

SOFA object containing the dataset data.

_to_sofa(sofa: Sofa, ingest_path: Path, output_path: Path) Path#

Copy sofar.Sofa file from ingest_dir and return Path.

Parameters:
sofasofar.Sofa

SOFA object to write.

ingest_pathpathlib.Path

Path to the ingestible file.

output_pathpathlib.Path

Path where the .sofa file should be written to.

Returns:
pathlib.Path

Path to the written SOFA file.

irdl.base._get_dataset_classes(module: ModuleType) list[type]#

Return concrete BaseDataset subclasses exported by module.