API Reference
- recursive_diff.recursive_diff(lhs: Any, rhs: Any, *, rel_tol: float = 1e-09, abs_tol: float = 0.0, brief_dims: Collection[Hashable] | Literal['all'] = ()) Generator[str]
Compare two objects and yield all differences. The two objects must any of:
basic types (int, float, complex, bool, str, bytes)
basic collections (list, tuple, dict, set, frozenset)
numpy scalar types
any recursive combination of the above
any other object (compared with ==)
Special treatment is reserved to different types:
floats and ints are compared with tolerance, using
math.isclose()complex numbers are compared with tolerance, using
math.isclose()separately on the real and imaginary partsNaN equals to NaN
floats without decimals compare as equal to ints
complex numbers without imaginary part DO NOT compare as equal to floats, as they have substantially different behaviour
bools are only equal to other bools
numpy arrays are compared elementwise and with tolerance, also testing the dtype, using
numpy.isclose(lhs, rhs)for numeric arrays and equality for other dtypes.pandas and Xarray objects are compared elementwise, with tolerance, and without order. Duplicate indices are not supported.
Xarray dimensions and variables are compared without order
collections (list, tuple, dict, set, frozenset) are recursively descended into
generic/unknown objects are compared with ==
Custom classes can be registered to benefit from the above behaviour; see
cast().- Parameters:
lhs – left-hand-side data structure
rhs – right-hand-side data structure
rel_tol (float) – relative tolerance when comparing numbers. Applies to floats, integers, and all numpy-based data.
abs_tol (float) – absolute tolerance when comparing numbers. Applies to floats, integers, and all numpy-based data.
brief_dims –
One of:
collection of strings representing Xarray dimensions. If one or more differences are found along one of these dimensions, only one message will be reported, stating the differences count.
”all”, to produce one line only for every Xarray variable that differs
Omit to output a line for every single different cell.
Yields strings containing difference messages, prepended by the path to the point that differs.
- recursive_diff.recursive_eq(lhs: Any, rhs: Any, rel_tol: float = 1e-09, abs_tol: float = 0.0, *, brief_dims: Collection[Hashable] | Literal['all'] = ()) None
Wrapper around
recursive_diff().Print out all differences to stdout and finally assert that there are none. This is meant to be used inside pytest, where stdout is captured.
- recursive_diff.diff_arrays(lhs: Any, rhs: Any, *, rel_tol: float = 1e-09, abs_tol: float = 0.0, brief_dims: Collection[Hashable] | Literal['all'] = ()) tuple[dict[str, DataFrame], list[str]]
Compare two objects with
recursive_diff().Return tuple of:
{path: dataframe of differences} for all NumPy, Pandas, and Xarray objects found. Arrays with no differences won’t be returned.
List of all other differences found. This includes differences in metadata, shape, dtype, and indices in NumPy, Pandas, and Xarray objects.
- recursive_diff.display_diffs(lhs: Any, rhs: Any, *, rel_tol: float = 1e-09, abs_tol: float = 0.0, brief_dims: Collection[Hashable] | Literal['all'] = ()) None
Compare two objects with
recursive_diff().Display all differences in Jupyter notebook, with diffs in NumPy, Pandas, and Xarray objects displayed as tables.
- recursive_diff.cast(obj: object) object
- recursive_diff.cast(obj: tuple) list
- recursive_diff.cast(obj: frozenset) set
- recursive_diff.cast(obj: integer) int
- recursive_diff.cast(obj: floating) float
- recursive_diff.cast(obj: complexfloating) complex
- recursive_diff.cast(obj: ndarray) dict[Any, Any]
- recursive_diff.cast(obj: Series) dict[str, Any]
- recursive_diff.cast(obj: DataFrame) dict[str, Any]
- recursive_diff.cast(obj: DataArray) DataArray | dict[Any, Any]
- recursive_diff.cast(obj: Dataset) dict[str, Any]
Helper function of
recursive_diff().Cast objects into simpler object types:
Cast tuple to list
Cast frozenset to set
Cast NumPy generics to pure-Python objects
Cast array-based objects to
xarray.DataArray, as it is the most generic format that can describe all use cases:
The data will be potentially wrapped by a dict to hold the various attributes and marked so that it doesn’t trigger an infinite recursion.
Do nothing for any other object types.
See Extending recursive_diff/recursive_eq for more details.
- Parameters:
obj – complex object that must be simplified
- Returns:
simpler object to compare
- recursive_diff.open(fname: str | Path, *, format: Literal['json', 'jsonl', 'msgpack', 'yaml', 'yml', 'netcdf', 'nc', 'zarr'] | None = None, chunks: int | dict | Literal['auto'] | None = None, netcdf_engine: str | None = None) Any
Open a single file from disk and return it as a recursively comparable object.
Supported file formats:
JSON (.json)
JSON Lines (.jsonl)
MessagePack (.msgpack)
YAML (.yaml, .yml)
netCDF v3/v4 (.nc, .netcdf)
Zarr v2/v3 (.zarr)
Different file formats require additional dependencies; see Optional dependencies.
For netCDF and Zarr files, this function reads the metadata into RAM; loading the actual data is delayed until later (typically until you feed the output of this function to
recursive_diff()orrecursive_eq()). Other file formats are loaded eagerly unless you pass the chunks parameter.JSONL files are loaded as pure-python lists, not with
pandas.read_json()ordask.dataframe.read_json(). This allows better support for mismatched keys on different lines.- Parameters:
fname (str | pathlib.Path) – path to file
format (str) – File format. Default: infer from file extension.
chunks – Passed to
xarray.open_dataset(). For files other than netCDF and Zarr, any value other than None causes the function to return a Dask delayed object.netcdf_engine (str) – netCDF engine (see
xarray.open_dataset()). Ignored for other file formats. Default: use Xarray default depending on what is available.
- Returns:
- netCDF and Zarr files:
- other files, chunks is not None:
dask.delayed.Delayedthat computes to a pure-python object.- other files, chunks=None:
a pure-python object.
The output can be passed as either the
lhsorrhsargument ofrecursive_diff()orrecursive_eq().
- recursive_diff.recursive_open(path: str, patterns: str | Collection[str] = ('**/*.json', '**/*.jsonl', '**/*.msgpack', '**/*.yaml', '**/*.yml', '**/*.nc', '**/*.zarr'), *, format: Literal['json', 'jsonl', 'msgpack', 'yaml', 'yml', 'netcdf', 'nc', 'zarr'] | None = None, chunks: int | dict | Literal['auto'] | None = None, netcdf_engine: str | None = None) dict[str, Any]
Recursively find and open all supported files that exist in any of the given local paths. See
open()for supported file formats.- Parameters:
path (str) – Root directory to search into
patterns (str | list[str]) – One or more glob patterns relative to path
format (str) – File format. Default: infer from file extension.
chunks – Passed to
xarray.open_dataset(). For files other than netCDF and Zarr, any value other than None causes the function to return a Dask delayed object.netcdf_engine (str) – netCDF engine (see
xarray.open_dataset()). Ignored for other file formats. Default: use Xarray default depending on what is available.
- Returns:
dict of {file name relative to path: file contents}, which can be passed as either the
lhsorrhsargument ofrecursive_diff()orrecursive_eq().
Thread-safety note: this function is not thread-safe on Python 3.9.