API Reference

recursive_diff.recursive_diff(lhs: Any, rhs: Any, *, rel_tol: float = 1e-09, abs_tol: float = 0.0, brief_dims: Collection[Hashable] | Literal['all'] = ()) Generator[str]

Compare two objects and yield all differences. The two objects must any of:

Special treatment is reserved to different types:

  • floats and ints are compared with tolerance, using math.isclose()

  • complex numbers are compared with tolerance, using math.isclose() separately on the real and imaginary parts

  • NaN equals to NaN

  • floats without decimals compare as equal to ints

  • complex numbers without imaginary part DO NOT compare as equal to floats, as they have substantially different behaviour

  • bools are only equal to other bools

  • numpy arrays are compared elementwise and with tolerance, also testing the dtype, using numpy.isclose(lhs, rhs) for numeric arrays and equality for other dtypes.

  • pandas and Xarray objects are compared elementwise, with tolerance, and without order. Duplicate indices are not supported.

  • Xarray dimensions and variables are compared without order

  • collections (list, tuple, dict, set, frozenset) are recursively descended into

  • generic/unknown objects are compared with ==

Custom classes can be registered to benefit from the above behaviour; see cast().

Parameters:
  • lhs – left-hand-side data structure

  • rhs – right-hand-side data structure

  • rel_tol (float) – relative tolerance when comparing numbers. Applies to floats, integers, and all numpy-based data.

  • abs_tol (float) – absolute tolerance when comparing numbers. Applies to floats, integers, and all numpy-based data.

  • brief_dims

    One of:

    • collection of strings representing Xarray dimensions. If one or more differences are found along one of these dimensions, only one message will be reported, stating the differences count.

    • ”all”, to produce one line only for every Xarray variable that differs

    Omit to output a line for every single different cell.

Yields strings containing difference messages, prepended by the path to the point that differs.

recursive_diff.recursive_eq(lhs: Any, rhs: Any, rel_tol: float = 1e-09, abs_tol: float = 0.0, *, brief_dims: Collection[Hashable] | Literal['all'] = ()) None

Wrapper around recursive_diff().

Print out all differences to stdout and finally assert that there are none. This is meant to be used inside pytest, where stdout is captured.

recursive_diff.diff_arrays(lhs: Any, rhs: Any, *, rel_tol: float = 1e-09, abs_tol: float = 0.0, brief_dims: Collection[Hashable] | Literal['all'] = ()) tuple[dict[str, DataFrame], list[str]]

Compare two objects with recursive_diff().

Return tuple of:

  • {path: dataframe of differences} for all NumPy, Pandas, and Xarray objects found. Arrays with no differences won’t be returned.

  • List of all other differences found. This includes differences in metadata, shape, dtype, and indices in NumPy, Pandas, and Xarray objects.

recursive_diff.display_diffs(lhs: Any, rhs: Any, *, rel_tol: float = 1e-09, abs_tol: float = 0.0, brief_dims: Collection[Hashable] | Literal['all'] = ()) None

Compare two objects with recursive_diff().

Display all differences in Jupyter notebook, with diffs in NumPy, Pandas, and Xarray objects displayed as tables.

recursive_diff.cast(obj: object) object
recursive_diff.cast(obj: tuple) list
recursive_diff.cast(obj: frozenset) set
recursive_diff.cast(obj: integer) int
recursive_diff.cast(obj: floating) float
recursive_diff.cast(obj: complexfloating) complex
recursive_diff.cast(obj: ndarray) dict[Any, Any]
recursive_diff.cast(obj: Series) dict[str, Any]
recursive_diff.cast(obj: DataFrame) dict[str, Any]
recursive_diff.cast(obj: DataArray) DataArray | dict[Any, Any]
recursive_diff.cast(obj: Dataset) dict[str, Any]

Helper function of recursive_diff().

Cast objects into simpler object types:

The data will be potentially wrapped by a dict to hold the various attributes and marked so that it doesn’t trigger an infinite recursion.

Do nothing for any other object types.

See Extending recursive_diff/recursive_eq for more details.

Parameters:

obj – complex object that must be simplified

Returns:

simpler object to compare

recursive_diff.open(fname: str | Path, *, format: Literal['json', 'jsonl', 'msgpack', 'yaml', 'yml', 'netcdf', 'nc', 'zarr'] | None = None, chunks: int | dict | Literal['auto'] | None = None, netcdf_engine: str | None = None) Any

Open a single file from disk and return it as a recursively comparable object.

Supported file formats:

  • JSON (.json)

  • JSON Lines (.jsonl)

  • MessagePack (.msgpack)

  • YAML (.yaml, .yml)

  • netCDF v3/v4 (.nc, .netcdf)

  • Zarr v2/v3 (.zarr)

Different file formats require additional dependencies; see Optional dependencies.

For netCDF and Zarr files, this function reads the metadata into RAM; loading the actual data is delayed until later (typically until you feed the output of this function to recursive_diff() or recursive_eq()). Other file formats are loaded eagerly unless you pass the chunks parameter.

JSONL files are loaded as pure-python lists, not with pandas.read_json() or dask.dataframe.read_json(). This allows better support for mismatched keys on different lines.

Parameters:
  • fname (str | pathlib.Path) – path to file

  • format (str) – File format. Default: infer from file extension.

  • chunks – Passed to xarray.open_dataset(). For files other than netCDF and Zarr, any value other than None causes the function to return a Dask delayed object.

  • netcdf_engine (str) – netCDF engine (see xarray.open_dataset()). Ignored for other file formats. Default: use Xarray default depending on what is available.

Returns:

netCDF and Zarr files:

xarray.Dataset

other files, chunks is not None:

dask.delayed.Delayed that computes to a pure-python object.

other files, chunks=None:

a pure-python object.

The output can be passed as either the lhs or rhs argument of recursive_diff() or recursive_eq().

recursive_diff.recursive_open(path: str, patterns: str | Collection[str] = ('**/*.json', '**/*.jsonl', '**/*.msgpack', '**/*.yaml', '**/*.yml', '**/*.nc', '**/*.zarr'), *, format: Literal['json', 'jsonl', 'msgpack', 'yaml', 'yml', 'netcdf', 'nc', 'zarr'] | None = None, chunks: int | dict | Literal['auto'] | None = None, netcdf_engine: str | None = None) dict[str, Any]

Recursively find and open all supported files that exist in any of the given local paths. See open() for supported file formats.

Parameters:
  • path (str) – Root directory to search into

  • patterns (str | list[str]) – One or more glob patterns relative to path

  • format (str) – File format. Default: infer from file extension.

  • chunks – Passed to xarray.open_dataset(). For files other than netCDF and Zarr, any value other than None causes the function to return a Dask delayed object.

  • netcdf_engine (str) – netCDF engine (see xarray.open_dataset()). Ignored for other file formats. Default: use Xarray default depending on what is available.

Returns:

dict of {file name relative to path: file contents}, which can be passed as either the lhs or rhs argument of recursive_diff() or recursive_eq().

Thread-safety note: this function is not thread-safe on Python 3.9.