ncdiff

Compare either two NetCDF files or all NetCDF files in two directories.

Usage

usage: ncdiff.py [-h]
                 [--engine {netcdf4,scipy,pydap,h5netcdf,pynio,cfgrib,pseudonetcdf}]
                 [--quiet] [--recursive] [--match MATCH] [--rtol RTOL]
                 [--atol ATOL] [--brief_dims DIM [DIM ...] | --brief]
                 lhs rhs

Compare either two NetCDF files or all NetCDF files in two directories.

positional arguments:
  lhs                   Left-hand-side NetCDF file or (if --recursive) directory
  rhs                   Right-hand-side NetCDF file or (if --recursive) directory

optional arguments:
  -h, --help            show this help message and exit
  --engine {netcdf4,scipy,pydap,h5netcdf,pynio,cfgrib,pseudonetcdf},
  -e {netcdf4,scipy,pydap,h5netcdf,pynio,cfgrib,pseudonetcdf}
                        NeCDF engine (may require additional modules
  --quiet, -q           Suppress logging
  --recursive, -r       Compare all NetCDF files with matching names in two directories
  --match MATCH, -m MATCH
                        Bash wildcard match for file names when using --recursive (default: **/*.nc)
  --rtol RTOL           Relative comparison tolerance (default: 1e-9)
  --atol ATOL           Absolute comparison tolerance (default: 0)
  --brief_dims DIM [DIM ...]
                        Just count differences along one or more dimensions instead of printing them out individually
  --brief, -b           Just count differences for every variable instead of printing them out individually

Examples:

Compare two NetCDF files:
  ncdiff a.nc b.nc
Compare all NetCDF files with identical names in two directories:
  ncdiff -r dir1 dir2

Chunking and RAM design

This tool does not support chunked files, or loading only part of large datasets into memory at once. Instead, chunked datasets are loaded as individual files. One variable at a time is then loaded into memory completely, compared, and then discarded.

This has the big advantage of simplicity, but a few disadvantages:

  • No option to compare datasets with mismatched prefixes (e.g. foo.*.nc vs. bar.*.nc).

  • No option to compare chunked datasets that differ only in chunking

  • Slower, as there is no option to skip loading over and over again variables that don’t sit on the concat_dim. See also xarray#2039.

  • Huge RAM usage in case of monolithic variables

Further limitations

  • Won’t compare NetCDF settings, e.g. store version, compression, chunking, etc.

  • Doesn’t support indices with duplicate elements