base#
Base Dataset class and conversion utilities for IRDL.
This module provides the BaseDataset abstract base class which serves as the common interface for all Dataset implementations. Each Dataset subclass must implement:
validate_params()
download()
ingest()
_source_filename()
The BaseDataset class handles:
Common parameter extraction
Path construction
Cache checking
Output format conversion from SOFA
- class irdl.base.BaseDataset#
Bases:
ABCAbstract base class providing common interface for all Dataset implementations.
- Attributes:
Methods
_validate_params(**dataset_kwargs)
Validate dataset-specific parameters (including output_format).
_source_filename(**dataset_kwargs) -> str
Construct the raw input filename with extension.
_download(**dataset_kwargs) -> Path
Download and return Path to raw file.
_process(provider_artifact: Path, ingest_path: Path, **_dataset_kwargs) -> Path:
Post-process downloaded file if needed
_ingest(ingest_path: Path) -> sofar.Sofa
Convert processed or raw file to sofar.Sofa object.
get() @classmethod
Public entry point. Uses explicit type signature for CLI auto-generation.
- abstractmethod _download(provider_dir: Path, **dataset_kwargs) Path#
Concrete download logic. Override in subclass.
- _export_raw(provider_artifact: Path, export_dir: Path) Path#
Export raw provider artifact to export directory.
For file artifacts, copies the file with its actual name. For directory artifacts, copies all contents to the output base directory. Raises ValueError if provider_artifact is neither a file nor a directory.
- _get(cache_dir: Path | str | None, export_dir: Path | str | None, output_format: str, **dataset_kwargs) dict | Path | None#
Internal implementation of Dataset retrieval.
- Parameters:
- cache_dir
pathlib.PathorstrorNone Cache directory for downloads.
- export_dir
pathlib.PathorstrorNone Directory for final output. Default is None (stays in cache_dir).
- output_format
str Output format: ‘pyfar’, ‘numpy’, ‘hdf5’, ‘sofa’, or ‘raw’.
- **dataset_kwargs
dict Dataset-specific parameters.
- cache_dir
- Returns:
dictorpathlib.PathFor ‘pyfar’ / ‘numpy’: a dict of in-memory objects. For ‘sofa’ / ‘hdf5’ / ‘raw’: a
pathlib.Pathto the file on disk.
- _get_doc_prefix = "Download {name} dataset.\n\nParameters\n----------\ncache_dir : str\n Cache directory for downloads. Defaults is the OS user cache directory.\n This default can be overridden by setting `IRDL_CACHE_DIR` environment variable.\nexport_dir : str, optional\n Directory for final output. If specified, the data will be exported to <export_dir/{name}/>. Else, it remains in\n <cache_dir/output/>.\noutput_format : str\n Output format: 'pyfar', 'numpy', 'hdf5', 'sofa', or 'raw'.\n"#
- abstractmethod _ingest(ingest_path: Path) Sofa#
Convert processed or raw file to sofar.Sofa object.
Override in subclass.
- Parameters:
- ingest_path
pathlib.Path Path to the ingest-ready file in the
ingest/subdirectory.
- ingest_path
- Returns:
- sofa
sofar.Sofa SOFA object representing the Dataset data.
- sofa
- _output_path(output_dir: Path, source_filename: str, output_format: str) Path | None#
Return the canonical Path where a file-based output would be written.
Returns None for formats (‘pyfar’, ‘numpy’, ‘raw’). Constructs the Path based on filename, directory target and output format.
- Parameters:
- Returns:
- _process(provider_artifact: Path, ingest_path: Path, **_dataset_kwargs) Path#
Post-process downloaded file if needed.
Override in subclass to extract, merge, or otherwise transform the downloaded data. Write the processed, ingest-ready file to
ingest_pathand return it.The default implementation promotes the provider file to the ingest stage. If the provider path is a file and differs from the ingest path, it creates a hard link (or falls back to a copy) so the ingest file exists.
- Parameters:
- provider artifact
Path Path to the freshly downloaded file (or download directory).
- ingest_path
pathlib.Path Path to the ingestible file in the ingest directory.
- **dataset_kwargs
dict Dataset-specific parameters (unused by the default implementation).
- provider artifact
- Returns:
- ingest_path
Path The processed, ingest-ready file at
ingest_path.
- ingest_path
- abstractmethod _source_filename(**dataset_kwargs) str#
Construct the ingest-ready filename with extension for the dataset.
Override in subclass.
This name is canonical:
_gettreats the existence of that path as proof that download and processing are already done (if so it skips both and ingests the file directly). The name therefore must match the file that actually ends up on disk after_download+_process: i.e. the processed file (merged/extracted), which is not necessarily the raw download.
- _to_hdf5(sofa: Sofa, ingest_path: Path, output_path: Path) Path#
Convert sofar.Sofa to HDF5 file and return Path.
- Parameters:
- sofa
sofar.Sofa SOFA object to convert.
- ingest_path
pathlib.Path Path to the ingestible file.
- output_path
pathlib.Path Path where the .h5 file should be written.
- sofa
- Returns:
pathlib.PathPath to the written HDF5 file.
- _to_numpy(sofa: Sofa) dict#
Convert sofar.Sofa to dict of numpy arrays.
- Parameters:
- sofa
sofar.Sofa SOFA object to convert.
- sofa
- Returns:
dictDictionary with keys: - “impulse_response” :
numpy.ndarray- “source_coordinates” :numpy.ndarray- “receiver_coordinates” :numpy.ndarray- “sampling_rate” : float
- _to_output(sofa: Sofa, output_format: str, ingest_path: Path, output_path: Path | None) dict | Path#
Convert sofar.Sofa to the requested output format.
- Parameters:
- sofa
sofar.Sofa SOFA object to convert.
- output_format
str One of “pyfar”, “numpy”, “hdf5”, “sofa”.
- ingest_path
pathlib.Path Path to the ingestible file. We also pass to allow for file-based export mechanics that avoid loading into memory.
- output_path
pathlib.PathorNone Path where file-based outputs should be written.
- sofa
- Returns:
dictorpathlib.PathOutput depends on output_format: - “pyfar” : dict of
pyfar.Signalandpyfar.Coordinatesobjects - “numpy” : dict ofnumpy.ndarrayarrays - “hdf5” :pathlib.Pathto .h5 file - “sofa” :pathlib.Pathto .sofa file
- _to_pyfar(sofa: Sofa) dict#
Convert sofar.Sofa to dict of pyfar objects.
- Parameters:
- sofa
sofar.Sofa SOFA object to convert.
- sofa
- Returns:
dictDictionary with keys: - “impulse_response” :
pyfar.Signal- “source_coordinates” :pyfar.Coordinates- “receiver_coordinates” :pyfar.Coordinates
- _to_sofa(sofa: Sofa, ingest_path: Path, output_path: Path) Path#
Write sofar.Sofa to file and return Path.
- Parameters:
- sofa
sofar.Sofa SOFA object to write.
- ingest_path
pathlib.Path Path to the ingestible file.
- output_path
pathlib.Path Path where the .sofa file should be written.
- sofa
- Returns:
pathlib.PathPath to the written SOFA file.
- abstractmethod _validate_params(**dataset_kwargs) None#
Validate dataset-specific parameters.
Override in subclass. This method receives dataset-specific parameters plus
output_format(so subclasses can forbid invalid output_format / dataset-parameter combinations).- Parameters:
- **dataset_kwargs
dict Dataset-specific parameters to validate, including
output_format.
- **dataset_kwargs
- Raises:
ValueErrorIf any parameter is invalid.
- download(provider_dir: Path, **dataset_kwargs) Path#
Download raw files and return Path to the primary artifact.
This method wraps _download to enforce provider_dir existence for all subclasses.
- Parameters:
- provider_dir
pathlib.Path Target path where the file(s) should be downloaded to. For single-file providers, this may be the file path itself. For multi-file providers, this may be a directory where files are placed.
- **dataset_kwargs
dict Dataset-specific parameters.
- provider_dir
- Returns:
- provider_artifact
pathlib.Path Path to the downloaded artifact on disk (file or directory).
- provider_artifact
- process(provider_artifact: Path, ingest_path: Path, **dataset_kwargs) Path#
Post-process downloaded file if needed.
This method wraps _process to enforce ingest_dir existence for all subclasses.
- Parameters:
- provider artifact
Path Path to the freshly downloaded file (or download directory).
- ingest_path
pathlib.Path Path to the ingestible file in the ingest directory.
- **dataset_kwargs
dict Dataset-specific parameters passed through to
_process.
- provider artifact
- Returns:
- ingest_path
Path The processed, ingest-ready file at
ingest_path.
- ingest_path
- class irdl.base.DatasetCategory(*values)#
Bases:
StrEnumCategories for grouping datasets.
- HEAD_RELATED_IMPULSE_RESPONSES = 'head_related_impulse_responses'#
- ROOM_IMPULSE_RESPONSES = 'room_impulse_responses'#
- static _generate_next_value_(name, start, count, last_values)#
Return the lower-cased version of the member name.
- class irdl.base.SofaBaseDataset#
Bases:
BaseDatasetBase class for datasets whose ingest-ready format is already SOFA.
The primary distinction is that
output_format='sofa'can either directly copy or link the ingest-ready file, avoiding having to write the sofa file in memory.- _ingest(ingest_path: Path) Sofa#
Load SOFA file into sofar.Sofa object.
- Parameters:
- ingest_path
pathlib.Path Path to the SOFA file in the ingest directory.
- ingest_path
- Returns:
sofar.SofaSOFA object containing the dataset data.
- _to_sofa(sofa: Sofa, ingest_path: Path, output_path: Path) Path#
Copy sofar.Sofa file from ingest_dir and return Path.
- Parameters:
- sofa
sofar.Sofa SOFA object to write.
- ingest_path
pathlib.Path Path to the ingestible file.
- output_path
pathlib.Path Path where the .sofa file should be written to.
- sofa
- Returns:
pathlib.PathPath to the written SOFA file.
- irdl.base._get_dataset_classes(module: ModuleType) list[type]#
Return concrete BaseDataset subclasses exported by module.