`base`#

Base Dataset class and conversion utilities for IRDL.

This module provides the BaseDataset abstract base class which serves as the common interface for all Dataset implementations. Each Dataset subclass must implement:

validate_params()
download()
ingest()
_source_filename()

The BaseDataset class handles:

Common parameter extraction
Path construction
Cache checking
Output format conversion from SOFA

class irdl.base.BaseDataset#

Bases: ABC

Abstract base class providing common interface for all Dataset implementations.

Attributes:

namestr: Unique identifier for the Dataset.
doistr: Digital Object Identifier for the Dataset.

Methods

_validate_params(dataset_kwargs)**	Validate dataset-specific parameters (including output_format).
_source_filename(dataset_kwargs) -> str**	Construct the raw input filename with extension.
_download(dataset_kwargs) -> Path**	Download and return Path to raw file.
_process(provider_artifact: Path, ingest_path: Path, _dataset_kwargs) -> Path:**	Post-process downloaded file if needed
_ingest(ingest_path: Path) -> sofar.Sofa	Convert processed or raw file to sofar.Sofa object.
get() @classmethod	Public entry point. Uses explicit type signature for CLI auto-generation.

abstractmethod _download(provider_dir: Path, **dataset_kwargs) → Path#: Concrete download logic. Override in subclass.

_export_raw(provider_artifact: Path, export_dir: Path) → Path#

Export raw provider artifact to export directory.

For file artifacts, copies the file with its actual name. For directory artifacts, copies all contents to the output base directory. Raises ValueError if provider_artifact is neither a file nor a directory.

Parameters:

provider_artifactPath: Path to the downloaded artifact (file or directory).
export_dirPath: Target export directory.

Returns:

Path: Path to the exported file or directory.

Internal implementation of Dataset retrieval.

Parameters:

cache_dirpathlib.Path or str or None: Cache directory for downloads.
export_dirpathlib.Path or str or None: Directory for final output. Default is None (stays in cache_dir).
output_formatstr: Output format: ‘pyfar’, ‘numpy’, ‘hdf5’, ‘sofa’, or ‘raw’.
**dataset_kwargsdict: Dataset-specific parameters.

Returns:

dict or pathlib.Path: For ‘pyfar’ / ‘numpy’: a dict of in-memory objects. For ‘sofa’ / ‘hdf5’ / ‘raw’: a pathlib.Path to the file on disk.

_get_doc_prefix = "Download {name} dataset.\n\nParameters\n----------\ncache_dir : str\n Cache directory for downloads. Defaults is the OS user cache directory.\n This default can be overridden by setting `IRDL_CACHE_DIR` environment variable.\nexport_dir : str, optional\n Directory for final output. If specified, the data will be exported to <export_dir/{name}/>. Else, it remains in\n <cache_dir/output/>.\noutput_format : str\n Output format: 'pyfar', 'numpy', 'hdf5', 'sofa', or 'raw'.\n"#

abstractmethod _ingest(ingest_path: Path) → Sofa#

Convert processed or raw file to sofar.Sofa object.

Override in subclass.

Parameters:

ingest_pathpathlib.Path: Path to the ingest-ready file in the ingest/ subdirectory.

Returns:

sofasofar.Sofa: SOFA object representing the Dataset data.

_output_path(output_dir: Path, source_filename: str, output_format: str) → Path | None#

Return the canonical Path where a file-based output would be written.

Returns None for formats (‘pyfar’, ‘numpy’, ‘raw’). Constructs the Path based on filename, directory target and output format.

Parameters:

output_dirPath: The output directory. Either cache_dir/output, export_dir, or export_dir/raw.
source_filenamestr: The name of the ingestible file. Constructed with _source_filename
output_formatstr: One of ‘pyfar’, ‘numpy’, ‘hdf5’, ‘sofa’, ‘raw’.

Returns:

Path or None: Canonical output path, or None for in-memory formats.

_process(provider_artifact: Path, ingest_path: Path, **_dataset_kwargs) → Path#

Post-process downloaded file if needed.

Override in subclass to extract, merge, or otherwise transform the downloaded data. Write the processed, ingest-ready file to ingest_path and return it.

The default implementation promotes the provider file to the ingest stage. If the provider path is a file and differs from the ingest path, it creates a hard link (or falls back to a copy) so the ingest file exists.

Parameters:

provider artifactPath: Path to the freshly downloaded file (or download directory).
ingest_pathpathlib.Path: Path to the ingestible file in the ingest directory.
**dataset_kwargsdict: Dataset-specific parameters (unused by the default implementation).

Returns:

ingest_pathPath: The processed, ingest-ready file at ingest_path.

abstractmethod _source_filename(**dataset_kwargs) → str#

Construct the ingest-ready filename with extension for the dataset.

Override in subclass.

This name is canonical: _get treats the existence of that path as proof that download and processing are already done (if so it skips both and ingests the file directly). The name therefore must match the file that actually ends up on disk after _download + _process: i.e. the processed file (merged/extracted), which is not necessarily the raw download.

Parameters:

**dataset_kwargsdict: Dataset-specific parameters used to construct the filename.

Returns:

str: The ingest-ready filename including extension (e.g., “A1.h5”, “FABIAN_HRIR_measured_HATO_0.sofa”).

_to_hdf5(sofa: Sofa, ingest_path: Path, output_path: Path) → Path#

Convert sofar.Sofa to HDF5 file and return Path.

Parameters:

sofasofar.Sofa: SOFA object to convert.
ingest_pathpathlib.Path: Path to the ingestible file.
output_pathpathlib.Path: Path where the .h5 file should be written.

Returns:

pathlib.Path: Path to the written HDF5 file.

_to_numpy(sofa: Sofa) → dict#

Convert sofar.Sofa to dict of numpy arrays.

Parameters:

sofasofar.Sofa: SOFA object to convert.

Returns:

dict: Dictionary with keys: - “impulse_response” : numpy.ndarray - “source_coordinates” : numpy.ndarray - “receiver_coordinates” : numpy.ndarray - “sampling_rate” : float

_to_output(sofa: Sofa, output_format: str, ingest_path: Path, output_path: Path | None) → dict | Path#

Convert sofar.Sofa to the requested output format.

Parameters:

sofasofar.Sofa: SOFA object to convert.
output_formatstr: One of “pyfar”, “numpy”, “hdf5”, “sofa”.
ingest_pathpathlib.Path: Path to the ingestible file. We also pass to allow for file-based export mechanics that avoid loading into memory.
output_pathpathlib.Path or None: Path where file-based outputs should be written.

Returns:

dict or pathlib.Path: Output depends on output_format: - “pyfar” : dict of pyfar.Signal and pyfar.Coordinates objects - “numpy” : dict of numpy.ndarray arrays - “hdf5” : pathlib.Path to .h5 file - “sofa” : pathlib.Path to .sofa file

_to_pyfar(sofa: Sofa) → dict#

Convert sofar.Sofa to dict of pyfar objects.

Parameters:

sofasofar.Sofa: SOFA object to convert.

Returns:

dict: Dictionary with keys: - “impulse_response” : pyfar.Signal - “source_coordinates” : pyfar.Coordinates - “receiver_coordinates” : pyfar.Coordinates

_to_sofa(sofa: Sofa, ingest_path: Path, output_path: Path) → Path#

Write sofar.Sofa to file and return Path.

Parameters:

sofasofar.Sofa: SOFA object to write.
ingest_pathpathlib.Path: Path to the ingestible file.
output_pathpathlib.Path: Path where the .sofa file should be written.

Returns:

pathlib.Path: Path to the written SOFA file.

abstractmethod _validate_params(**dataset_kwargs) → None#

Validate dataset-specific parameters.

Override in subclass. This method receives dataset-specific parameters plus output_format (so subclasses can forbid invalid output_format / dataset-parameter combinations).

Parameters:

**dataset_kwargsdict: Dataset-specific parameters to validate, including output_format.

Raises:

ValueError: If any parameter is invalid.

doi: str#

download(provider_dir: Path, **dataset_kwargs) → Path#

Download raw files and return Path to the primary artifact.

This method wraps _download to enforce provider_dir existence for all subclasses.

Parameters:

provider_dirpathlib.Path: Target path where the file(s) should be downloaded to. For single-file providers, this may be the file path itself. For multi-file providers, this may be a directory where files are placed.
**dataset_kwargsdict: Dataset-specific parameters.

Returns:

provider_artifactpathlib.Path: Path to the downloaded artifact on disk (file or directory).

name: str#

process(provider_artifact: Path, ingest_path: Path, **dataset_kwargs) → Path#

Post-process downloaded file if needed.

This method wraps _process to enforce ingest_dir existence for all subclasses.

Parameters:

provider artifactPath: Path to the freshly downloaded file (or download directory).
ingest_pathpathlib.Path: Path to the ingestible file in the ingest directory.
**dataset_kwargsdict: Dataset-specific parameters passed through to _process.

Returns:

ingest_pathPath: The processed, ingest-ready file at ingest_path.

class irdl.base.DatasetCategory(*values)#

Bases: StrEnum

Categories for grouping datasets.

HEAD_RELATED_IMPULSE_RESPONSES = 'head_related_impulse_responses'#

ROOM_IMPULSE_RESPONSES = 'room_impulse_responses'#

static _generate_next_value_(name, start, count, last_values)#: Return the lower-cased version of the member name.

class irdl.base.SofaBaseDataset#

Bases: BaseDataset

Base class for datasets whose ingest-ready format is already SOFA.

The primary distinction is that output_format='sofa' can either directly copy or link the ingest-ready file, avoiding having to write the sofa file in memory.

_ingest(ingest_path: Path) → Sofa#

Load SOFA file into sofar.Sofa object.

Parameters:

ingest_pathpathlib.Path: Path to the SOFA file in the ingest directory.

Returns:

sofar.Sofa: SOFA object containing the dataset data.

_to_sofa(sofa: Sofa, ingest_path: Path, output_path: Path) → Path#

Copy sofar.Sofa file from ingest_dir and return Path.

Parameters:

sofasofar.Sofa: SOFA object to write.
ingest_pathpathlib.Path: Path to the ingestible file.
output_pathpathlib.Path: Path where the .sofa file should be written to.

Returns:

pathlib.Path: Path to the written SOFA file.

irdl.base._get_dataset_classes(module: ModuleType) → list[type]#: Return concrete BaseDataset subclasses exported by module.

base#

`base`#