What’s New

v2.1.0 (2026-03-12)

  • New function diff_arrays() that returns dataframes of differences instead of text messages for all differences in NumPy, Pandas, and Xarray objects

  • New function display_diffs() that displays differences in Jupyter notebooks in tabular format

  • Fixed issues that would cause slowdowns and excessive RAM usage when comparing Dask arrays with 2+ dimensions using a distributed scheduler

  • Added support for P2P rechunk in Dask distributed

v2.0.0 (2026-02-22)

This release completely overhauls the diff engine, for up to 40x speed-up when comparing array data, and 2.4x when comparing JSON-like data, and adds support for reading JSON, JSONL, MessagePack, YAML, and Zarr files from disk.

  • Added optional pip dependency [all] to install all dependencies needed to open files on disk

  • New functions open() and recursive_open() for opening files from the Python API, supporting JSON, JSONL, MessagePack, YAML, NetCDF and Zarr file formats.

  • recursive_eq() now supports the brief_dims argument

  • Dask-backed Xarray objects are now compared chunk by chunk instead of loading an entire pair of variables into memory at once

  • Added support for Dask delayed objects

  • When comparing multiple Dask objects, use all available Dask threads to compare multiple objects at once instead of one variable at a time. This should result in speed-ups as disk reads are pipelined with comparisons. However, it can also cause higher memory usage depending on available CPUs and disk read speeds; you can control it with dask.config.set({"num_workers": 2}) or a similarly low number.

  • Added fast path for when lhs and rhs share some objects

  • Added fast path for identical builtin non-container objects

  • Added support for complex numbers

  • Added support for pandas.DataFrame with different dtypes for different columns

  • pandas.Index diffs are now much faster and retain the original order, instead of being sorted alphabetically

  • pandas.Index now compare dtypes

  • pandas.MultiIndex no longer compare names

  • Added support for datetime objects too large for M8[ns] (before year 1677 or after year 2262)

  • Dropped support for pynio, cfgrib, and pseudonetcdf netCDF engines

CLI changes

  • The ncdiff CLI tool has been deprecated in favor of the new recursive-diff

  • The recursive-diff CLI tool, in addition to netCDF, also supports and compares by default JSON, JSONL, MessagePack, YAML, and Zarr files

  • The recursive-diff CLI tool supports multiple wildcard patterns, e.g.:

    recursive-diff -r -m "foo*.nc" "bar*.nc" -- dir1 dir2
    

    Note

    This new feature implies a syntax difference between the legacy ncdiff CLI tool and the new recursive-diff:

    ncdiff -r -m "foo*.nc" dir1 dir2  # valid
    recursive-diff -r -m "foo*.nc" dir1 dir2  # NOT VALID
    recursive-diff -r -m "foo*.nc" "bar*.nc" -- dir1 dir2  # valid (note the --)
    recursive-diff -r dir1 dir2 -m "foo*.nc"  # valid
    
  • The CLI tool no longer requires Dask to be installed. It remains recommended to reduce memory usage and speed up the comparison.

Breaking changes

  • cast() no longer accepts the brief_dims argument

v1.3.0 (2025-10-14)

  • Test against Python 3.13 and 3.14

  • Test against recent Pandas versions (tested up to 3.0 beta)

  • Detect and handle recursion in data structures (#24)

  • Fixed warnings in recent Pandas versions (#27)

  • Bumped up minimum versions for all dependencies:

    Dependency

    v1.2.0

    v1.3.0

    python

    3.8

    3.9

    dask

    2.0

    2022.7.0

    numpy

    1.16

    1.23

    pandas

    0.25

    1.5

    xarray

    0.12

    2023.8.0

v1.2.0 (2024-03-16)

  • Added support for Python 3.11 and 3.12

  • Added support for recent Pandas versions (tested up to 2.2)

v1.1.0 (2022-03-26)

  • Added support for Python 3.8, 3.9, and 3.10

  • Type annotations

  • Support for pandas 1.0

  • This project now adheres to NEP-29; see Minimum dependency versions. Bumped up minimum versions for all dependencies:

    Dependency

    v1.0.0

    v1.1.0

    python

    3.5.0

    3.8

    dask

    0.19.0

    2.0

    numpy

    1.13

    1.16

    pandas

    0.21

    0.25

    xarray

    0.10.1

    0.12

  • Now using setuptools-scm for versioning

  • Migrated CI from Travis + AppVeyor + coveralls to GitHub actions + codecov.io

  • Added static code checkers (black, isort, absolufy_imports, flake8, mypy) to CI, wrapped by pre-commit

v1.0.0 (2019-01-02)

Initial release, split out from xarray-extras v0.3.0.