[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing latest netcdf 4.9.x library functionality: quantize, zstandard #725

Closed
durack1 opened this issue Jan 17, 2024 · 48 comments · Fixed by #751
Closed

Exposing latest netcdf 4.9.x library functionality: quantize, zstandard #725

durack1 opened this issue Jan 17, 2024 · 48 comments · Fixed by #751
Milestone

Comments

@durack1
Copy link
Contributor
durack1 commented Jan 17, 2024

The latest versions of libnetcdf include new functions to further squash data using lossy compression, see Charlie Zender: Why & How to Increase Dataset Compression in CMIP7 - in particular the quantize and zstandard operations.

How easy is it to expose this in CMOR 3.9?

ping @taylor13 @matthew-mizielinski @sashakames @piotrflorek @czender

Also see discussion #724

@durack1 durack1 added this to the 3.9.0 milestone Apr 7, 2024
@taylor13
Copy link
Collaborator
taylor13 commented May 6, 2024

There is ongoing discussion on how Charlie Z's approach might get standardized under CF. See cf-convention/cf-conventions#403. Not sure if the above is advocating that or not.

@mauzey1
Copy link
Collaborator
mauzey1 commented Jun 11, 2024

I was able to add nc_def_var_quantize to the code without issue (placed above lines that call nc_def_var_deflate). However, I'm encountering an issue when using nc_def_var_zstandard. Some of the NetCDF files produced by the CMOR tests are giving the following error when you try to read them with ncdump.

$ncdump data.nc
...
NetCDF: HDF error
Location: file ?; fcn ? line 478

@durack1
Copy link
Contributor Author
durack1 commented Jun 11, 2024

@mauzey1 great to see progress here! Do you need to have an updated version of ncdump that can deal with the *zstandard "compressed" data? I am unfamiliar with all these new functions, so not sure I'm going to be all that helpful. Charlie Zender (@czender) might be able to guide the new feature use somewhat, or point us at some docs that guide best practice use

@taylor13
Copy link
Collaborator

Yes, I think Charlie could save us much time if he can respond. He's always been very helpful in the past. If we don't hear from him in say a week, let's discuss further.

@czender
Copy link
czender commented Jun 11, 2024

Hi All. @mauzey1 the ncdump symptom you report above could indeed be due to using an ncdump that is not linked to the zstandard library, as suggested by @durack1. First, please use the nc-config tool built with the same netCDF installation that created your ncdump and post the results of nc-config --all. Second, please post the results of ncdump -h -s foo.nc where foo.nc is the problematic dataset.

@mauzey1
Copy link
Collaborator
mauzey1 commented Jun 11, 2024

@czender

Below is the output of nc-config --all

$ nc-config --all

This netCDF 4.9.2 has been built with the following features: 

  --cc            -> x86_64-apple-darwin13.4.0-clang
  --cflags        -> -I/Users/mauzey1/opt/anaconda3/envs/cmor_dev/include
  --libs          -> -L/Users/mauzey1/opt/anaconda3/envs/cmor_dev/lib -lnetcdf
  --static        -> -lmfhdf -ldf -lhdf5_hl -lhdf5 -lcrypto -lcurl -lpthread -lsz -lz -ldl -lm -lzip -lblosc -lzstd -lbz2 -lxml2


  --has-dap          -> yes
  --has-dap2         -> yes
  --has-dap4         -> yes
  --has-nc2          -> yes
  --has-nc4          -> yes
  --has-hdf5         -> yes
  --has-hdf4         -> yes
  --has-logging      -> no
  --has-pnetcdf      -> no
  --has-szlib        -> no
  --has-cdf5         -> yes
  --has-parallel4    -> no
  --has-parallel     -> no
  --has-nczarr       -> yes
  --has-zstd         -> yes
  --has-benchmarks   -> no
  --has-multifilters -> no
  --has-stdfilters   -> deflate szip blosc zstd bzip2
  --has-quantize     -> no

  --prefix        -> /Users/mauzey1/opt/anaconda3/envs/cmor_dev
  --includedir    -> /Users/mauzey1/opt/anaconda3/envs/cmor_dev/include
  --libdir        -> /Users/mauzey1/opt/anaconda3/envs/cmor_dev/lib
  --plugindir     -> /Users/mauzey1/opt/anaconda3/envs/cmor_dev/lib/hdf5/plugin
  --version       -> netCDF 4.9.2

Below is the output of ncdump -h -s on a file with this problem.

$ ncdump -h -s CMIP6/CMIP6/ISMIP6/PCMDI/PCMDI-test-1-0/piControl-withism/r3i1p1f1/Amon/ta/gn/v20240610/ta_Amon_PCMDI-test-1-0_piControl-withism_r3i1p1f1_gn_200801-200806.nc 
netcdf ta_Amon_PCMDI-test-1-0_piControl-withism_r3i1p1f1_gn_200801-200806 {
dimensions:
        time = UNLIMITED ; // (6 currently)
        plev = 19 ;
        lat = 90 ;
        lon = 180 ;
        bnds = 2 ;
variables:
        double time(time) ;
                time:bounds = "time_bnds" ;
                time:units = "days since 2008" ;
                time:calendar = "360_day" ;
                time:axis = "T" ;
                time:long_name = "time" ;
                time:standard_name = "time" ;
                time:_Storage = "chunked" ;
                time:_ChunkSizes = 512 ;
                time:_Endianness = "little" ;
        double time_bnds(time, bnds) ;
                time_bnds:_Storage = "chunked" ;
                time_bnds:_ChunkSizes = 1, 2 ;
                time_bnds:_DeflateLevel = 1 ;
                time_bnds:_Filter = "32015,0" ;
                time_bnds:_Endianness = "little" ;
        double plev(plev) ;
                plev:units = "Pa" ;
                plev:axis = "Z" ;
                plev:positive = "down" ;
                plev:long_name = "pressure" ;
                plev:standard_name = "air_pressure" ;
                plev:_Storage = "contiguous" ;
                plev:_Endianness = "little" ;
        double lat(lat) ;
                lat:bounds = "lat_bnds" ;
                lat:units = "degrees_north" ;
                lat:axis = "Y" ;
                lat:long_name = "Latitude" ;
                lat:standard_name = "latitude" ;
                lat:_Storage = "contiguous" ;
                lat:_Endianness = "little" ;
        double lat_bnds(lat, bnds) ;
                lat_bnds:_Storage = "chunked" ;
                lat_bnds:_ChunkSizes = 90, 2 ;
                lat_bnds:_DeflateLevel = 1 ;
                lat_bnds:_Filter = "32015,0" ;
                lat_bnds:_Endianness = "little" ;
        double lon(lon) ;
                lon:bounds = "lon_bnds" ;
                lon:units = "degrees_east" ;
                lon:axis = "X" ;
                lon:long_name = "Longitude" ;
                lon:standard_name = "longitude" ;
                lon:_Storage = "contiguous" ;
                lon:_Endianness = "little" ;
        double lon_bnds(lon, bnds) ;
                lon_bnds:_Storage = "chunked" ;
                lon_bnds:_ChunkSizes = 180, 2 ;
                lon_bnds:_DeflateLevel = 1 ;
                lon_bnds:_Filter = "32015,0" ;
                lon_bnds:_Endianness = "little" ;
        float ta(time, plev, lat, lon) ;
                ta:standard_name = "air_temperature" ;
                ta:long_name = "Air Temperature" ;
                ta:comment = "Air Temperature" ;
                ta:units = "K" ;
                ta:cell_methods = "time: mean" ;
                ta:cell_measures = "area: areacella" ;
                ta:missing_value = 1.e+20f ;
                ta:_FillValue = 1.e+20f ;
                ta:history = "2024-06-11T01:26:28Z altered by CMOR: Converted type from \'d\' to \'f\'." ;
                ta:_Storage = "chunked" ;
                ta:_ChunkSizes = 1, 19, 90, 180 ;
                ta:_DeflateLevel = 1 ;
                ta:_Filter = "32015,0" ;
                ta:_Endianness = "little" ;

// global attributes:
                :Conventions = "CF-1.7 CMIP-6.2" ;
                :activity_id = "ISMIP6" ;
                :branch_method = "no parent" ;
                :branch_time_in_child = 59400. ;
                :branch_time_in_parent = 0. ;
                :contact = "Python Coder (coder@a.b.c.com)" ;
                :creation_date = "2024-06-11T01:26:28Z" ;
                :data_specs_version = "01.00.33" ;
                :experiment = "preindustrial control with interactive ice sheet" ;
                :experiment_id = "piControl-withism" ;
                :external_variables = "areacella" ;
                :forcing_index = 1 ;
                :frequency = "mon" ;
                :further_info_url = "https://furtherinfo.es-doc.org/CMIP6.PCMDI.PCMDI-test-1-0.piControl-withism.none.r3i1p1f1" ;
                :grid = "native atmosphere regular grid (3x4 latxlon)" ;
                :grid_label = "gn" ;
                :history = "2024-06-11T01:26:28Z ;rewrote data to be consistent with ISMIP6 for variable ta found in table Amon.;\n",
                        "Output from archivcl_A1.nce/giccm_03_std_2xCO2_2256." ;
                :initialization_index = 1 ;
                :institution = "Program for Climate Model Diagnosis and Intercomparison, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA" ;
                :institution_id = "PCMDI" ;
                :mip_era = "CMIP6" ;
                :nominal_resolution = "10000 km" ;
                :parent_activity_id = "no parent" ;
                :parent_experiment_id = "no parent" ;
                :parent_mip_era = "no parent" ;
                :parent_source_id = "no parent" ;
                :parent_time_units = "no parent" ;
                :parent_variant_label = "no parent" ;
                :physics_index = 1 ;
                :product = "model-output" ;
                :realization_index = 3 ;
                :realm = "atmos" ;
                :references = "Model described by Koder and Tolkien (J. Geophys. Res., 2001, 576-591).  Also see http://www.GICC.su/giccm/doc/index.html.  The ssp245 simulation is described in Dorkey et al. \'(Clim. Dyn., 2003, 323-357.)\'" ;
                :run_variant = "3rd realization" ;
                :source = "PCMDI-test 1.0 (1989): \n",
                        "aerosol: none\n",
                        "atmos: Earth1.0-gettingHotter (360 x 180 longitude/latitude; 50 levels; top level 0.1 mb)\n",
                        "atmosChem: none\n",
                        "land: Earth1.0\n",
                        "landIce: none\n",
                        "ocean: BlueMarble1.0-warming (360 x 180 longitude/latitude; 50 levels; top grid cell 0-10 m)\n",
                        "ocnBgchem: none\n",
                        "seaIce: Declining1.0-warming (360 x 180 longitude/latitude)" ;
                :source_id = "PCMDI-test-1-0" ;
                :source_type = "AOGCM ISM AER" ;
                :sub_experiment = "none" ;
                :sub_experiment_id = "none" ;
                :table_id = "Amon" ;
                :table_info = "Creation Date:(18 November 2020) MD5:67956a9cc0ef05fb4b373ee8dcc6b433" ;
                :title = "PCMDI-test-1-0 output prepared for CMIP6" ;
                :tracking_id = "hdl:21.14100/072ecd15-09e2-4157-914e-59a711add511" ;
                :variable_id = "ta" ;
                :variant_label = "r3i1p1f1" ;
                :license = "CMIP6 model data produced by Lawrence Livermore PCMDI is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file) and at https:///pcmdi.llnl.gov/. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
                :cmor_version = "3.8.0" ;
                :_NCProperties = "version=2,netcdf=4.9.2,hdf5=1.14.3" ;
                :_SuperblockVersion = 2 ;
                :_IsNetcdf4 = 0 ;
                :_Format = "netCDF-4 classic model" ;
}

@czender
Copy link
czender commented Jun 11, 2024

That all looks nominal. Last requirements to check are contents of plugindir whether ncdump client actually searches there. Please post results of 1. echo ${HDF5_PLUGIN_PATH} and 2. ls -l /Users/mauzey1/opt/anaconda3/envs/cmor_dev/lib/hdf5/plugin

@durack1
Copy link
Contributor Author
durack1 commented Jun 11, 2024

--has-quantize -> no

@czender this is the issue, right?

@mauzey1
Copy link
Collaborator
mauzey1 commented Jun 11, 2024

The environment variable HDF5_PLUGIN_PATH is blank. Below is the HDF5 plugin directory.

ls -l /Users/mauzey1/opt/anaconda3/envs/cmor_dev/lib/hdf5/plugin
total 536
-rwxrwxr-x  7 mauzey1  28918  26128 Dec 10  2023 lib__nch5blosc.dylib
-rwxrwxr-x  7 mauzey1  28918  25672 Dec 10  2023 lib__nch5bzip2.dylib
-rwxrwxr-x  7 mauzey1  28918  25496 Dec 10  2023 lib__nch5deflate.dylib
-rwxrwxr-x  7 mauzey1  28918  25480 Dec 10  2023 lib__nch5fletcher32.dylib
-rwxrwxr-x  7 mauzey1  28918  25416 Dec 10  2023 lib__nch5shuffle.dylib
-rwxrwxr-x  7 mauzey1  28918  25976 Dec 10  2023 lib__nch5szip.dylib
-rwxrwxr-x  7 mauzey1  28918  25576 Dec 10  2023 lib__nch5zstd.dylib
-rwxrwxr-x  7 mauzey1  28918  35040 Dec 10  2023 lib__nczhdf5filters.dylib
-rwxrwxr-x  7 mauzey1  28918  34864 Dec 10  2023 lib__nczstdfilters.dylib

@czender
Copy link
czender commented Jun 11, 2024

@durack1 Good catch. I was focused on the plugin libraries for compression/decompression because that is usually the issue. I suppose the reported lack of quantization support could also mess things up. However, I'm a bit skeptical that the quantization is the issue because that should impact netCDF creation, not reading. (Quantized data are IEEE format so no special software is needed to read). And I thought @mauzey1 was just demonstrating that Zstandard was not working. @mauzey1 , is your test dataset also supposedly quantized? None of the variables' metadata includes the quantization attribute (e.g., _QuantizeBitRoundNumberOfSignificantBits = 9). I suppose it's possible there are two separate issues...

However, back to the plugin directory contents. @mauzey1 please retry your ncdump command after first setting export HDF5_PLUGIN_PATH=/Users/mauzey1/opt/anaconda3/envs/cmor_dev/lib/hdf5/plugin.

@mauzey1
Copy link
Collaborator
mauzey1 commented Jun 11, 2024

@czender I did try setting export HDF5_PLUGIN_PATH=/Users/mauzey1/opt/anaconda3/envs/cmor_dev/lib/hdf5/plugin but ncdump still gave me the same error.

@mauzey1
Copy link
Collaborator
mauzey1 commented Jun 11, 2024

As for the quantized issue, I haven't seen any errors arise from it. I'm currently just debugging errors caused by zstandard. The issue first came up in a test that was reading a netcdf file to append more data to it. The error message contained the same message that I saw in the ncdump output.

C Traceback:
! In function: cmor_validateFilename
! 

!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Error: NetCDF Error (-101: NetCDF: HDF error) creating file: CMIP6/CMIP6/ISMIP6/PCMDI/PCMDI-test-1-0/piControl-withism/r3i1p1f1/Amon/ta/gn/v20240611/ta_Amon_PCMDI-test-1-0_piControl-withism_r3i1p1f1_gnUjtBI821241.nc
!
!!!!!!!!!!!!!!!!!!!!!!!!!

I edited the test to only run the part before the "appending" portion to get the file it was appending to. That's where I saw the error in the ncdump output.

@mauzey1
Copy link
Collaborator
mauzey1 commented Jun 11, 2024

@czender How do you use NetCDF with zstandard in your code? Did you needed to pass certain flags to the compiler? I tried rebuilding CMOR with the HDF5_PLUGIN_PATH variable defined but still got the error.

How do you install NetCDF and other libraries? I'm using an Anaconda environment on MacOS. All of my packages are from conda-forge.

Here's the branch of CMOR with the quantize/zstandard changes: https://github.com/PCMDI/cmor/tree/725_expose_netcdf_quantize_and_zstandard

@czender
Copy link
czender commented Jun 11, 2024

@mauzey1 FWIW my Conda-forge installation of netCDF 4.9.2 also reports
--has-quantize -> no, and it correctly dumps Zstandard-compressed (and quantized) datasets.

The library names in my installation appear identical to yours, yet the dates and sizes differ. Mine are newer and larger than yours:

zender@spectral:~/anaconda/bin$ ls -l /Users/zender/anaconda/lib/hdf5/plugin
total 612
-rwxr-xr-x 2 zender staff 68928 Jun  6 04:03 lib__nch5blosc.dylib*
-rwxr-xr-x 2 zender staff 68480 Jun  6 04:03 lib__nch5bzip2.dylib*
-rwxr-xr-x 2 zender staff 68256 Jun  6 04:03 lib__nch5deflate.dylib*
-rwxr-xr-x 2 zender staff 68272 Jun  6 04:03 lib__nch5fletcher32.dylib*
-rwxr-xr-x 2 zender staff 68224 Jun  6 04:03 lib__nch5shuffle.dylib*
-rwxr-xr-x 2 zender staff 68784 Jun  6 04:03 lib__nch5szip.dylib*
-rwxr-xr-x 2 zender staff 68368 Jun  6 04:03 lib__nch5zstd.dylib*
-rwxr-xr-x 2 zender staff 69632 Jun  6 04:03 lib__nczhdf5filters.dylib*
-rwxr-xr-x 2 zender staff 69440 Jun  6 04:03 lib__nczstdfilters.dylib*

Any idea why this would be the case? Are you using conda-forge for all packages (otherwise library conflicts can easily arise)? In any case, please report results of the ncdump command after executing conda update libnetcdf.

@mauzey1
Copy link
Collaborator
mauzey1 commented Jun 11, 2024

I tried creating a new environment to reinstall libraries hoping it would upgrade everything. However, I got the same error as before. It appears that I've gotten the same version of the HDF5 plugins. I wonder if it's due to dependencies in my environment setup that are giving me older versions of libraries.

@czender If you tried creating the environments described in our source build instructions, do you get versions of the plugins different from your current environment?

@czender
Copy link
czender commented Jun 11, 2024

@mauzey1 Our messages crossed in the ether. To expand on the above, the ncdump from my conda-forge installations (on both MacOS and Linux machines) correctly dump Zstandard-compressed, quantized files. I am trying to emulate your environment. I did not mention that my typical workflow/environment on MacOS is to use a locally compiled recent daily snapshot of libnetcdf from the Unidata repository. My default compilers and other libraries on MacOS are all from Homebrew. And that setup also works fine. I do use a conda-forge based netCDF on acme1.llnl.gov. Everything works fine with that, and those libraries do appear to match yours in date, name, and (roughly) size:

zender1@acme1:~$ !513
ls -l /home/zender1/anaconda/lib/hdf5/plugin
total 164
-rwxr-xr-x. 2 zender1 climate 16392 Dec 10  2023 lib__nch5blosc.so
-rwxr-xr-x. 2 zender1 climate 16024 Dec 10  2023 lib__nch5bzip2.so
-rwxr-xr-x. 2 zender1 climate 15912 Dec 10  2023 lib__nch5deflate.so
-rwxr-xr-x. 2 zender1 climate 16064 Dec 10  2023 lib__nch5fletcher32.so
-rwxr-xr-x. 2 zender1 climate 15896 Dec 10  2023 lib__nch5shuffle.so
-rwxr-xr-x. 2 zender1 climate 16248 Dec 10  2023 lib__nch5szip.so
-rwxr-xr-x. 2 zender1 climate 16000 Dec 10  2023 lib__nch5zstd.so
-rwxr-xr-x. 2 zender1 climate 21784 Dec 10  2023 lib__nczhdf5filters.so
-rwxr-xr-x. 2 zender1 climate 21520 Dec 10  2023 lib__nczstdfilters.so

Here is an example of that conda-forge netCDF dumping a Zstandard-compressed, quantized file:

zender1@acme1:~$ which ncdump 
~/anaconda/bin/ncdump
zender1@acme1:~$ ncdump -h -s ~/foo1.nc
netcdf foo1 {
dimensions:
	time = UNLIMITED ; // (10 currently)
	lat = 2 ;
	lon = 4 ;
variables:
	float ps(time, lat, lon) ;
		ps:_QuantizeBitRoundNumberOfSignificantBits = 9 ;
		ps:lossy_compression = "compression_info" ;
		ps:lossy_compression_nsb = 9 ;
		ps:lossy_compression_maximum_relative_error = 0.0009765625f ;
		ps:standard_name = "surface_air_pressure" ;
		ps:units = "Pa" ;
		ps:_Storage = "chunked" ;
		ps:_ChunkSizes = 1145, 2, 4 ;
		ps:_Shuffle = "true" ;
		ps:_Filter = "32015,0" ;
		ps:_Endianness = "little" ;
	char compression_info ;
		compression_info:family = "quantize" ;
		compression_info:algorithm = "bitround" ;
		compression_info:implementation = "libnetcdf version 4.9.2" ;
		compression_info:_Storage = "contiguous" ;
	float ts(time) ;
		ts:_QuantizeBitRoundNumberOfSignificantBits = 9 ;
		ts:lossy_compression = "compression_info" ;
		ts:lossy_compression_nsb = 9 ;
		ts:lossy_compression_maximum_relative_error = 0.0009765625f ;
		ts:standard_name = "surface_temperature" ;
		ts:units = "K" ;
		ts:_Storage = "chunked" ;
		ts:_ChunkSizes = 1024 ;
		ts:_Shuffle = "true" ;
		ts:_Filter = "32015,0" ;
		ts:_Endianness = "little" ;

// global attributes:
		:Conventions = "CF-1.5" ;
		:history = "Tue Jun 11 10:46:26 2024: ncks -O -7 -C -v ps,ts --cmp=btr|shf|zst,0 /home/zender1/nco/data/in.nc /home/zender1/foo1.nc\n",
...
}

So the issue you have encountered is perplexing. Any ideas?

@mauzey1
Copy link
Collaborator
mauzey1 commented Jun 11, 2024

@czender Could you try building and running the tests of the CMOR branch I posted? Ideally using the environment described in the source build instructions.

Alternatively, could you send me a Zstandard-compressed file that I could try opening with my ncdump setup?

@czender
Copy link
czender commented Jun 11, 2024

@mauzey1 It would be easier for me if you would first try (and add to your CMOR instructions) this first:
conda install blosc bzip2 zstd and report whether you then achieve success with:
ncdump -s -h ~zender1/foo1.nc

@czender
Copy link
czender commented Jun 11, 2024

Here is a sample file. First rename it with mv foo1.nc.txt foo1.nc.
foo1.nc.txt

@mauzey1
Copy link
Collaborator
mauzey1 commented Jun 11, 2024

That file works for me without issue. Using my current dev environment, I got the following output.

$ ncdump -s -h foo1.nc 
netcdf foo1 {
dimensions:
        time = UNLIMITED ; // (10 currently)
        lat = 2 ;
        lon = 4 ;
variables:
        float ps(time, lat, lon) ;
                ps:_QuantizeBitRoundNumberOfSignificantBits = 9 ;
                ps:lossy_compression = "compression_info" ;
                ps:lossy_compression_nsb = 9 ;
                ps:lossy_compression_maximum_relative_error = 0.0009765625f ;
                ps:standard_name = "surface_air_pressure" ;
                ps:units = "Pa" ;
                ps:_Storage = "chunked" ;
                ps:_ChunkSizes = 1145, 2, 4 ;
                ps:_Shuffle = "true" ;
                ps:_Filter = "32015,0" ;
                ps:_Endianness = "little" ;
        char compression_info ;
                compression_info:family = "quantize" ;
                compression_info:algorithm = "bitround" ;
                compression_info:implementation = "libnetcdf version 4.9.3-development" ;
                compression_info:_Storage = "contiguous" ;
        float ts(time) ;
                ts:_QuantizeBitRoundNumberOfSignificantBits = 9 ;
                ts:lossy_compression = "compression_info" ;
                ts:lossy_compression_nsb = 9 ;
                ts:lossy_compression_maximum_relative_error = 0.0009765625f ;
                ts:standard_name = "surface_temperature" ;
                ts:units = "K" ;
                ts:_Storage = "chunked" ;
                ts:_ChunkSizes = 1024 ;
                ts:_Shuffle = "true" ;
                ts:_Filter = "32015,0" ;
                ts:_Endianness = "little" ;

// global attributes:
                :Conventions = "CF-1.5" ;
                :history = "Tue Jun 11 09:54:36 2024: ncks -O -7 -C -v ps,ts --cmp=btr|shf|zst,0 /Users/zender/nco/data/in.nc /Users/zender/foo1.nc\n",
                        "History global attribute.\n",
                        "Textual attributes like history often have embedded newlines like this.\n",
                        "Such newlines should serve as linebreaks on the screen to enhance legibility like this.\n",
                        "Friendly CDL converters print a single NC_CHAR attribute as a comma-separated list of strings where each embedded delimiter marks a linebreak. This makes poetry embedded in CDL much nicer to read (except for the printed literal \\n\'s---those are an eyesore):\n",
                        "\n",
                        "A POET by Hafiz\n",
                        "\n",
                        "A poet is someone\n",
                        "Who can pour light into a cup,\n",
                        "Then raise it to nourish\n",
                        "Your beautiful parched, holy mouth\n",
                        "" ;
                :lorem_ipsum = "The Lorem Ipsum attribute demonstrates the legibility of text without embedded linebreaks:\n",
                        "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Lady Gaga amat indueris vestimento laetus. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum." ;
                :julian_day = 200000.04 ;
                :RCS_Header = "$Header$" ;
                :NCO = "netCDF Operators version 5.2.5-alpha02 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco, Citation = 10.1016/j.envsoft.2008.03.004)" ;
                :_NCProperties = "version=2,netcdf=4.9.3-development,hdf5=1.14.3" ;
                :_SuperblockVersion = 2 ;
                :_IsNetcdf4 = 0 ;
                :_Format = "netCDF-4 classic model" ;
}

I'm trying to run conda install blosc bzip2 zstd in my base environment but it is taking a long time. Running it in my 'cmor_dev' environment worked but those libraries are already installed.

@czender
Copy link
czender commented Jun 11, 2024

@mauzey1 Since you can dump the file I sent but not your original test file then my hunch is now that your original test file was somehow generate in a corrupted state. Do you agree? Put it somewhere on acme1 where I can read it and I'll take a look.

@mauzey1
Copy link
Collaborator
mauzey1 commented Jun 11, 2024

@czender Can you get it here?

bad_file.nc.txt

@czender
Copy link
czender commented Jun 11, 2024

The file appears to be fine. Both ncdump and ncks can read it fine on my MacOS and Linux machines, with various netCDF libraries, including conda-forge. I think that narrows down the potential causes to the configuration of your reading environment. The writing appears to work fine. Umm...the only thing that comes to mind is a possible mis-configuration of LD_LIBRARY_PATH which, if it exists at all, should point to the a netCDF library with correctly linked codecs, such as the conda-forge netCDF library that wrote the file.

@mauzey1
Copy link
Collaborator
mauzey1 commented Jun 11, 2024

I tried reading the same file on a Linux machine with a conda environment made withconda create -n ncdf_env -c conda-forge netcdf4 hdf5 blosc bzip2 zstd and got the following output.

-bash-4.2$ ls -lh ~/anaconda3/envs/ncdf_env/lib/hdf5/plugin/
total 164K
-rwxr-x---. 2 mauzey1 climate 17K Jun  6 04:00 lib__nch5blosc.so
-rwxr-x---. 2 mauzey1 climate 16K Jun  6 04:00 lib__nch5bzip2.so
-rwxr-x---. 2 mauzey1 climate 16K Jun  6 04:00 lib__nch5deflate.so
-rwxr-x---. 2 mauzey1 climate 16K Jun  6 04:00 lib__nch5fletcher32.so
-rwxr-x---. 2 mauzey1 climate 16K Jun  6 04:00 lib__nch5shuffle.so
-rwxr-x---. 2 mauzey1 climate 16K Jun  6 04:00 lib__nch5szip.so
-rwxr-x---. 2 mauzey1 climate 16K Jun  6 04:00 lib__nch5zstd.so
-rwxr-x---. 2 mauzey1 climate 22K Jun  6 04:00 lib__nczhdf5filters.so
-rwxr-x---. 2 mauzey1 climate 22K Jun  6 04:00 lib__nczstdfilters.so
-bash-4.2$ pwd
/home/mauzey1
-bash-4.2$ export HDF5_PLUGIN_PATH=/home/mauzey1/anaconda3/envs/ncdf_env/lib/hdf5/plugin/
-bash-4.2$ ncdump bad_file.nc 
netcdf bad_file {
dimensions:
        time = UNLIMITED ; // (6 currently)
        plev = 19 ;
        lat = 90 ;
        lon = 180 ;
        bnds = 2 ;
variables:
        double time(time) ;
                time:bounds = "time_bnds" ;
                time:units = "days since 2008" ;
                time:calendar = "360_day" ;
                time:axis = "T" ;
                time:long_name = "time" ;
                time:standard_name = "time" ;
        double time_bnds(time, bnds) ;
        double plev(plev) ;
                plev:units = "Pa" ;
                plev:axis = "Z" ;
                plev:positive = "down" ;
                plev:long_name = "pressure" ;
                plev:standard_name = "air_pressure" ;
        double lat(lat) ;
                lat:bounds = "lat_bnds" ;
                lat:units = "degrees_north" ;
                lat:axis = "Y" ;
                lat:long_name = "Latitude" ;
                lat:standard_name = "latitude" ;
        double lat_bnds(lat, bnds) ;
        double lon(lon) ;
                lon:bounds = "lon_bnds" ;
                lon:units = "degrees_east" ;
                lon:axis = "X" ;
                lon:long_name = "Longitude" ;
                lon:standard_name = "longitude" ;
        double lon_bnds(lon, bnds) ;
        float ta(time, plev, lat, lon) ;
                ta:standard_name = "air_temperature" ;
                ta:long_name = "Air Temperature" ;
                ta:comment = "Air Temperature" ;
                ta:units = "K" ;
                ta:cell_methods = "time: mean" ;
                ta:cell_measures = "area: areacella" ;
                ta:missing_value = 1.e+20f ;
                ta:_FillValue = 1.e+20f ;
                ta:history = "2024-06-11T18:43:28Z altered by CMOR: Converted type from \'d\' to \'f\'." ;

// global attributes:
                :Conventions = "CF-1.7 CMIP-6.2" ;
                :activity_id = "ISMIP6" ;
                :branch_method = "no parent" ;
                :branch_time_in_child = 59400. ;
                :branch_time_in_parent = 0. ;
                :contact = "Python Coder (coder@a.b.c.com)" ;
                :creation_date = "2024-06-11T18:43:28Z" ;
                :data_specs_version = "01.00.33" ;
                :experiment = "preindustrial control with interactive ice sheet" ;
                :experiment_id = "piControl-withism" ;
                :external_variables = "areacella" ;
                :forcing_index = 1 ;
                :frequency = "mon" ;
                :further_info_url = "https://furtherinfo.es-doc.org/CMIP6.PCMDI.PCMDI-test-1-0.piControl-withism.none.r3i1p1f1" ;
                :grid = "native atmosphere regular grid (3x4 latxlon)" ;
                :grid_label = "gn" ;
                :history = "2024-06-11T18:43:28Z ;rewrote data to be consistent with ISMIP6 for variable ta found in table Amon.;\n",
                        "Output from archivcl_A1.nce/giccm_03_std_2xCO2_2256." ;
                :initialization_index = 1 ;
                :institution = "Program for Climate Model Diagnosis and Intercomparison, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA" ;
                :institution_id = "PCMDI" ;
                :mip_era = "CMIP6" ;
                :nominal_resolution = "10000 km" ;
                :parent_activity_id = "no parent" ;
                :parent_experiment_id = "no parent" ;
                :parent_mip_era = "no parent" ;
                :parent_source_id = "no parent" ;
                :parent_time_units = "no parent" ;
                :parent_variant_label = "no parent" ;
                :physics_index = 1 ;
                :product = "model-output" ;
                :realization_index = 3 ;
                :realm = "atmos" ;
                :references = "Model described by Koder and Tolkien (J. Geophys. Res., 2001, 576-591).  Also see http://www.GICC.su/giccm/doc/index.html.  The ssp245 simulation is described in Dorkey et al. \'(Clim. Dyn., 2003, 323-357.)\'" ;
                :run_variant = "3rd realization" ;
                :source = "PCMDI-test 1.0 (1989): \n",
                        "aerosol: none\n",
                        "atmos: Earth1.0-gettingHotter (360 x 180 longitude/latitude; 50 levels; top level 0.1 mb)\n",
                        "atmosChem: none\n",
                        "land: Earth1.0\n",
                        "landIce: none\n",
                        "ocean: BlueMarble1.0-warming (360 x 180 longitude/latitude; 50 levels; top grid cell 0-10 m)\n",
                        "ocnBgchem: none\n",
                        "seaIce: Declining1.0-warming (360 x 180 longitude/latitude)" ;
                :source_id = "PCMDI-test-1-0" ;
                :source_type = "AOGCM ISM AER" ;
                :sub_experiment = "none" ;
                :sub_experiment_id = "none" ;
                :table_id = "Amon" ;
                :table_info = "Creation Date:(18 November 2020) MD5:67956a9cc0ef05fb4b373ee8dcc6b433" ;
                :title = "PCMDI-test-1-0 output prepared for CMIP6" ;
                :tracking_id = "hdl:21.14100/665a2970-e412-4ddf-8e43-2b7ee12b577d" ;
                :variable_id = "ta" ;
                :variant_label = "r3i1p1f1" ;
                :license = "CMIP6 model data produced by Lawrence Livermore PCMDI is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file) and at https:///pcmdi.llnl.gov/. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
                :cmor_version = "3.8.0" ;
data:

 time = 15, 45, 75, 105, 135, 165 ;

 time_bnds =
  0, 30,
NetCDF: HDF error
Location: file ?; fcn ? line 478

Output of nc-config --all

-bash-4.2$ nc-config --all

This netCDF 4.9.2 has been built with the following features: 

  --cc            -> x86_64-conda-linux-gnu-cc
  --cflags        -> -I/home/mauzey1/anaconda3/envs/ncdf_env/include
  --libs          -> -L/home/mauzey1/anaconda3/envs/ncdf_env/lib -lnetcdf
  --static        -> -lmfhdf -ldf -lhdf5_hl -lhdf5 -lcrypto -lcurl -lpthread -lsz -lz -ldl -lm -lzip -lblosc -lzstd -lbz2 -lxml2


  --has-dap          -> yes
  --has-dap2         -> yes
  --has-dap4         -> yes
  --has-nc2          -> yes
  --has-nc4          -> yes
  --has-hdf5         -> yes
  --has-hdf4         -> yes
  --has-logging      -> no
  --has-pnetcdf      -> no
  --has-szlib        -> no
  --has-cdf5         -> yes
  --has-parallel4    -> no
  --has-parallel     -> no
  --has-nczarr       -> yes
  --has-zstd         -> yes
  --has-benchmarks   -> no
  --has-multifilters -> no
  --has-stdfilters   -> deflate szip blosc zstd bzip2
  --has-quantize     -> no

  --prefix        -> /home/mauzey1/anaconda3/envs/ncdf_env
  --includedir    -> /home/mauzey1/anaconda3/envs/ncdf_env/include
  --libdir        -> /home/mauzey1/anaconda3/envs/ncdf_env/lib
  --plugindir     -> /home/mauzey1/anaconda3/envs/ncdf_env/lib/hdf5/plugin
  --version       -> netCDF 4.9.2

Packages installed in this environment:

-bash-4.2$ conda list
# packages in environment at /home/mauzey1/anaconda3/envs/ncdf_env:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
blosc                     1.21.5               hc2324a3_1    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
c-ares                    1.28.1               hd590300_0    conda-forge
ca-certificates           2024.6.2             hbcca054_0    conda-forge
certifi                   2024.6.2           pyhd8ed1ab_0    conda-forge
cftime                    1.6.4           py312h085067d_0    conda-forge
hdf4                      4.2.15               h2a13503_7    conda-forge
hdf5                      1.14.3          nompi_hdf9ad27_105    conda-forge
icu                       73.2                 h59595ed_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.21.2               h659d440_0    conda-forge
ld_impl_linux-64          2.40                 hf3520f5_3    conda-forge
libaec                    1.1.3                h59595ed_0    conda-forge
libblas                   3.9.0           22_linux64_openblas    conda-forge
libcblas                  3.9.0           22_linux64_openblas    conda-forge
libcurl                   8.8.0                hca28451_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libexpat                  2.6.2                h59595ed_0    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.2.0               h77fa898_7    conda-forge
libgfortran-ng            13.2.0               h69a702a_7    conda-forge
libgfortran5              13.2.0               hca663fb_7    conda-forge
libgomp                   13.2.0               h77fa898_7    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
libjpeg-turbo             3.0.0                hd590300_1    conda-forge
liblapack                 3.9.0           22_linux64_openblas    conda-forge
libnetcdf                 4.9.2           nompi_h135f659_114    conda-forge
libnghttp2                1.58.0               h47da74e_1    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libopenblas               0.3.27          pthreads_h413a1c8_0    conda-forge
libsqlite                 3.46.0               hde9e2c9_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-ng              13.2.0               hc0a3c3a_7    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxml2                   2.12.7               hc051c1a_1    conda-forge
libzip                    1.10.1               h2629f0a_3    conda-forge
libzlib                   1.3.1                h4ab18f5_1    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
ncurses                   6.5                  h59595ed_0    conda-forge
netcdf4                   1.6.5           nompi_py312h39d4375_102    conda-forge
numpy                     1.26.4          py312heda63a1_0    conda-forge
openssl                   3.3.1                h4ab18f5_0    conda-forge
pip                       24.0               pyhd8ed1ab_0    conda-forge
python                    3.12.3          hab00c5b_0_cpython    conda-forge
python_abi                3.12                    4_cp312    conda-forge
readline                  8.2                  h8228510_1    conda-forge
setuptools                70.0.0             pyhd8ed1ab_0    conda-forge
snappy                    1.2.0                hdb0a2a9_1    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
tzdata                    2024a                h0c530f3_0    conda-forge
wheel                     0.43.0             pyhd8ed1ab_1    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zlib                      1.3.1                h4ab18f5_1    conda-forge
zstd                      1.5.6                ha6fb4c9_0    conda-forge

All of our CI jobs, running on both Linux and Mac, are failing on the same test that produced that file when testing the zstandard/quantize branch.

Here is said test: https://github.com/PCMDI/cmor/blob/725_expose_netcdf_quantize_and_zstandard/Test/test_python_appending.py

@czender
Copy link
czender commented Jun 11, 2024

@mauzey1 My bad. You are not crazy :) I had only been dumping metadata, under the mistaken impression that the issue would show itself printing metadata. I can reproduce the error you encounter when printing the full data with ncdump:

zender@spectral:~$ ~/anaconda/bin/ncdump bad_file.nc
...
time_bnds =
 0, 30,
NetCDF: HDF error
Location: file ?; fcn ? line 478

Using the latest netCDF snapshot the error comes from a different source line:

zender@spectral:~$ ncdump bad_file.nc
...
time_bnds =
  0, 30,
NetCDF: HDF error
Location: file vardata.c; fcn print_rows line 483

NCO's ncks gives a more informative error message:

zender@spectral:~$ ncks bad_file.nc
...
    time = 15, 45, 75, 105, 135, 165 ;

ERROR: nco_get_vara() failed to nc_get_vara() variable "time_bnds"
ERROR NC_EHDFERR Error at HDF5 layer
HINT: NC_EHDFERR errors indicate that the HDF5-backend to netCDF is unable to perform the requested task. NCO can receive this devilishly inscrutable error for a variety of possible reasons including: 1) The run-time dynamic linker attempts to resolve calls from the netCDF library to the HDF library with an HDF5 libhdf5.a that is incompatible with the version used to build NCO and netCDF. 2) The file system does not allow the HDF5 flock() function, as of HDF5 1.10.x, to enable multiple processes to open the same file for reading, a feature known as SWMR (Single Write Multiple Read). The fix is to disable the HDF5 flock() by setting an environment variable thusly: "export HDF5_USE_FILE_LOCKING=FALSE". 3) An incorrect netCDF4 library implementation of a procedure (e.g., nc_rename_var()) in terms of HDF function calls (e.g., HDF5Lmove()) manifests an error or inconsistent state within the HDF5 layer. This often occurs during renaming operations (https://github.com/Unidata/netcdf-c/issues/597). 4) Attempting to compress or decompress a netCDF4 dataset with a non-standard (i.e., non-DEFLATE) filter when the requisite shared library to encode/decode that compression filter is not present in either the default location (/usr/local/hdf5/lib/plugin) or in the user-configurable location referred to by the HDF5_PLUGIN_PATH environment variable. One can determine if missing plugin libraries are the culprit by dumping the hidden attributes of the dataset with, e.g., ncks --hdn -m in.nc or ncdump -s -h in.nc. Any variables with (hidden) "_Filter" attributes require the corresponding shared libraries to be located in HDF5_PLUGIN_PATH. Some HDF5 implementations (at least MacOSX with MacPorts as of 20200907) may also require explicitly setting the plugin path in the environment, even for the default location! To test this, re-try your NCO command after doing this: "export HDF5_PLUGIN_PATH=/usr/local/hdf5/lib/plugin". 5) Bad vibes.
nco_err_exit(): ERROR Short NCO-generated message (usually name of function that triggered error): nco_get_vara()
nco_err_exit(): ERROR Error code is -101. Translation into English with nc_strerror(-101) is "NetCDF: HDF error"
nco_err_exit(): ERROR NCO will now exit with system call exit(EXIT_FAILURE)

Based on this error message, I looked more carefully at the metadata for time_bnds, which is where the HDF error occurs:

zender@spectral:~$ ncks -D 2 --hdn -m -C -v time_bnds bad_file.nc
netcdf bad_file {
  // ncgen -k "netCDF-4 classic model" -b -o bad_file.nc bad_file.cdl
  dimensions:
    bnds = 2 ; // ID = 4
    time = UNLIMITED ; // (6 currently) ID = 0

  variables:
    double time_bnds(time,bnds) ; // RAM size = 6*2*sizeof(NC_DOUBLE) = 12*8 = 96 bytes, ID = 1
      time_bnds:_Storage = "chunked" ; // char
      time_bnds:_ChunkSizes = 1, 2 ; // int
      time_bnds:_Filter = "32015,3" ; // char codec(s): Zstandard
      time_bnds:_DeflateLevel = 1 ; // int
      time_bnds:_Endianness = "little" ; // char
} // group /

Next, I verified that ncks can dump the entire file except time_bnds:

zender@spectral:~$ ncks -C -x -v time_bnds bad_file.nc
netcdf bad_file {
...
} // group /

A few points about all this:
First, the HDF error occurs after successfully reading and printing the other compressed variables, including ta. Second, time_bnds is a small variable with only 12 values. Are you sure that all the values written are "normal"? Related to this, the chunksize of time_bnds is only 1*2=2 values per chunk. I wonder if there is a bug in one of the codecs (or their combination) that only reveals itself when chunksizes are super small. Maybe try not compressing variables with chunksizes smaller than a minimum threshold (e.g., 16). Third, I did not notice earlier the strange fact that both DEFLATE and Zstandard codecs have been applied. Why does CMOR use two lossless codecs? It should be possible to do this, yet little if anything is gained in compression by doing so. Moreover, the Shuffle filter has not been activated. Doing that would save much more space than "double compressing" the data. Please try applying only one lossless codec and see if that resolves the HDF error. Again, the netCDF filter code is written to support an arbitrary number of codecs, and I have tested that capability with 3+ lossless codecs in the past, yet it does seem possible that doing so is related to this bug.

Given all this, my hunch is that the small chunksize of time_bnds, not the "double compression" is what triggers the bug that causes the HDF error.

@mauzey1
Copy link
Collaborator
mauzey1 commented Jun 11, 2024

@czender

I tried disabling the use of zstandard compression on the bounds variables and ncdump read the generated file without issue. I also tried disabling deflate and that also made the file get read by ncdump without issue.

I noticed that some files generated by the tests didn't give ncdump issues. I'll look at the other files to see what could be causing the issue.

Thank you for helping debug this issue.

@taylor13
Copy link
Collaborator

@mauzey1 If it is easy to do, I would be curious to see what the values of the time_bnds are in the case where you disabled use of zstandard compression. Could you print those out? This would help answer the question "Are you sure that all the values written are "normal"? Of course if the same input bounds are used for other tests that are successfully written, this would be unlikely to be the cause of any problems.

@mauzey1
Copy link
Collaborator
mauzey1 commented Jun 11, 2024

@taylor13 Here is the ncdump output with the time bounds displayed.

$ ncdump -v time_bnds CMIP6/CMIP6/ISMIP6/PCMDI/PCMDI-test-1-0/piControl-withism/r3i1p1f1/Amon/ta/gn/v20240611/ta_Amon_PCMDI-test-1-0_piControl-withism_r3i1p1f1_gn_200801-200806.nc 
netcdf ta_Amon_PCMDI-test-1-0_piControl-withism_r3i1p1f1_gn_200801-200806 {
dimensions:
        time = UNLIMITED ; // (6 currently)
        plev = 19 ;
        lat = 90 ;
        lon = 180 ;
        bnds = 2 ;
variables:
        double time(time) ;
                time:bounds = "time_bnds" ;
                time:units = "days since 2008" ;
                time:calendar = "360_day" ;
                time:axis = "T" ;
                time:long_name = "time" ;
                time:standard_name = "time" ;
        double time_bnds(time, bnds) ;
        double plev(plev) ;
                plev:units = "Pa" ;
                plev:axis = "Z" ;
                plev:positive = "down" ;
                plev:long_name = "pressure" ;
                plev:standard_name = "air_pressure" ;
        double lat(lat) ;
                lat:bounds = "lat_bnds" ;
                lat:units = "degrees_north" ;
                lat:axis = "Y" ;
                lat:long_name = "Latitude" ;
                lat:standard_name = "latitude" ;
        double lat_bnds(lat, bnds) ;
        double lon(lon) ;
                lon:bounds = "lon_bnds" ;
                lon:units = "degrees_east" ;
                lon:axis = "X" ;
                lon:long_name = "Longitude" ;
                lon:standard_name = "longitude" ;
        double lon_bnds(lon, bnds) ;
        float ta(time, plev, lat, lon) ;
                ta:standard_name = "air_temperature" ;
                ta:long_name = "Air Temperature" ;
                ta:comment = "Air Temperature" ;
                ta:units = "K" ;
                ta:cell_methods = "time: mean" ;
                ta:cell_measures = "area: areacella" ;
                ta:missing_value = 1.e+20f ;
                ta:_FillValue = 1.e+20f ;
                ta:history = "2024-06-11T22:31:01Z altered by CMOR: Converted type from \'d\' to \'f\'." ;

// global attributes:
                :Conventions = "CF-1.7 CMIP-6.2" ;
                :activity_id = "ISMIP6" ;
                :branch_method = "no parent" ;
                :branch_time_in_child = 59400. ;
                :branch_time_in_parent = 0. ;
                :contact = "Python Coder (coder@a.b.c.com)" ;
                :creation_date = "2024-06-11T22:31:01Z" ;
                :data_specs_version = "01.00.33" ;
                :experiment = "preindustrial control with interactive ice sheet" ;
                :experiment_id = "piControl-withism" ;
                :external_variables = "areacella" ;
                :forcing_index = 1 ;
                :frequency = "mon" ;
                :further_info_url = "https://furtherinfo.es-doc.org/CMIP6.PCMDI.PCMDI-test-1-0.piControl-withism.none.r3i1p1f1" ;
                :grid = "native atmosphere regular grid (3x4 latxlon)" ;
                :grid_label = "gn" ;
                :history = "2024-06-11T22:31:01Z ;rewrote data to be consistent with ISMIP6 for variable ta found in table Amon.;\n",
                        "Output from archivcl_A1.nce/giccm_03_std_2xCO2_2256." ;
                :initialization_index = 1 ;
                :institution = "Program for Climate Model Diagnosis and Intercomparison, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA" ;
                :institution_id = "PCMDI" ;
                :mip_era = "CMIP6" ;
                :nominal_resolution = "10000 km" ;
                :parent_activity_id = "no parent" ;
                :parent_experiment_id = "no parent" ;
                :parent_mip_era = "no parent" ;
                :parent_source_id = "no parent" ;
                :parent_time_units = "no parent" ;
                :parent_variant_label = "no parent" ;
                :physics_index = 1 ;
                :product = "model-output" ;
                :realization_index = 3 ;
                :realm = "atmos" ;
                :references = "Model described by Koder and Tolkien (J. Geophys. Res., 2001, 576-591).  Also see http://www.GICC.su/giccm/doc/index.html.  The ssp245 simulation is described in Dorkey et al. \'(Clim. Dyn., 2003, 323-357.)\'" ;
                :run_variant = "3rd realization" ;
                :source = "PCMDI-test 1.0 (1989): \n",
                        "aerosol: none\n",
                        "atmos: Earth1.0-gettingHotter (360 x 180 longitude/latitude; 50 levels; top level 0.1 mb)\n",
                        "atmosChem: none\n",
                        "land: Earth1.0\n",
                        "landIce: none\n",
                        "ocean: BlueMarble1.0-warming (360 x 180 longitude/latitude; 50 levels; top grid cell 0-10 m)\n",
                        "ocnBgchem: none\n",
                        "seaIce: Declining1.0-warming (360 x 180 longitude/latitude)" ;
                :source_id = "PCMDI-test-1-0" ;
                :source_type = "AOGCM ISM AER" ;
                :sub_experiment = "none" ;
                :sub_experiment_id = "none" ;
                :table_id = "Amon" ;
                :table_info = "Creation Date:(18 November 2020) MD5:67956a9cc0ef05fb4b373ee8dcc6b433" ;
                :title = "PCMDI-test-1-0 output prepared for CMIP6" ;
                :tracking_id = "hdl:21.14100/2c77d159-cda7-4009-a9e7-3bb813978041" ;
                :variable_id = "ta" ;
                :variant_label = "r3i1p1f1" ;
                :license = "CMIP6 model data produced by Lawrence Livermore PCMDI is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file) and at https:///pcmdi.llnl.gov/. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
                :cmor_version = "3.8.0" ;
data:

 time_bnds =
  0, 30,
  30, 60,
  60, 90,
  90, 120,
  120, 150,
  150, 180 ;
}

@taylor13
Copy link
Collaborator

O.K. thanks. Nothing fishy about those numbers.

@czender
Copy link
czender commented Jun 11, 2024

It seems to be due to an interaction of "double compression" (i.e., two lossless codecs) with microscopic chunk sizes.

@taylor13
Copy link
Collaborator

Perhaps given @czender 's comment, would one option be to simply forbid "double" compression, which really doesn't buy much and does impact performance (and in this case messes up the file). If we forbid this configuration, will we solve all our problems?
Of course, the problem may rear its head in other circumstances, so perhaps it would be worth pinning down the problem, so that it doesn't possibly go unnoticed when it occurs even when now double compression.

@mauzey1
Copy link
Collaborator
mauzey1 commented Jun 12, 2024

I did an experiment with the test that generated the file that had problems with the time bounds axis. I reduced the size of the time, latitude, and longitude of the data generated. The resulting file didn't have the problem like the original one.

$ ncdump -s CMIP6/CMIP6/ISMIP6/PCMDI/PCMDI-test-1-0/piControl-withism/r3i1p1f1/Amon/ta/gn/v20240611/ta_Amon_PCMDI-test-1-0_piControl-withism_r3i1p1f1_gn_200801-200801.nc 
netcdf ta_Amon_PCMDI-test-1-0_piControl-withism_r3i1p1f1_gn_200801-200801 {
dimensions:
        time = UNLIMITED ; // (1 currently)
        plev = 19 ;
        lat = 2 ;
        lon = 2 ;
        bnds = 2 ;
variables:
        double time(time) ;
                time:bounds = "time_bnds" ;
                time:units = "days since 2008" ;
                time:calendar = "360_day" ;
                time:axis = "T" ;
                time:long_name = "time" ;
                time:standard_name = "time" ;
                time:_Storage = "chunked" ;
                time:_ChunkSizes = 512 ;
                time:_Endianness = "little" ;
        double time_bnds(time, bnds) ;
                time_bnds:_Storage = "chunked" ;
                time_bnds:_ChunkSizes = 1, 2 ;
                time_bnds:_DeflateLevel = 1 ;
                time_bnds:_Filter = "32015,3" ;
                time_bnds:_Endianness = "little" ;
        double plev(plev) ;
                plev:units = "Pa" ;
                plev:axis = "Z" ;
                plev:positive = "down" ;
                plev:long_name = "pressure" ;
                plev:standard_name = "air_pressure" ;
                plev:_Storage = "contiguous" ;
                plev:_Endianness = "little" ;
        double lat(lat) ;
                lat:bounds = "lat_bnds" ;
                lat:units = "degrees_north" ;
                lat:axis = "Y" ;
                lat:long_name = "Latitude" ;
                lat:standard_name = "latitude" ;
                lat:_Storage = "contiguous" ;
                lat:_Endianness = "little" ;
        double lat_bnds(lat, bnds) ;
                lat_bnds:_Storage = "chunked" ;
                lat_bnds:_ChunkSizes = 2, 2 ;
                lat_bnds:_DeflateLevel = 1 ;
                lat_bnds:_Filter = "32015,3" ;
                lat_bnds:_Endianness = "little" ;
        double lon(lon) ;
                lon:bounds = "lon_bnds" ;
                lon:units = "degrees_east" ;
                lon:axis = "X" ;
                lon:long_name = "Longitude" ;
                lon:standard_name = "longitude" ;
                lon:_Storage = "contiguous" ;
                lon:_Endianness = "little" ;
        double lon_bnds(lon, bnds) ;
                lon_bnds:_Storage = "chunked" ;
                lon_bnds:_ChunkSizes = 2, 2 ;
                lon_bnds:_DeflateLevel = 1 ;
                lon_bnds:_Filter = "32015,3" ;
                lon_bnds:_Endianness = "little" ;
        float ta(time, plev, lat, lon) ;
                ta:standard_name = "air_temperature" ;
                ta:long_name = "Air Temperature" ;
                ta:comment = "Air Temperature" ;
                ta:units = "K" ;
                ta:cell_methods = "time: mean" ;
                ta:cell_measures = "area: areacella" ;
                ta:missing_value = 1.e+20f ;
                ta:_FillValue = 1.e+20f ;
                ta:history = "2024-06-12T00:48:56Z altered by CMOR: Converted type from \'d\' to \'f\'." ;
                ta:_Storage = "chunked" ;
                ta:_ChunkSizes = 1, 19, 2, 2 ;
                ta:_DeflateLevel = 1 ;
                ta:_Filter = "32015,3" ;
                ta:_Endianness = "little" ;

// global attributes:
                :Conventions = "CF-1.7 CMIP-6.2" ;
                :activity_id = "ISMIP6" ;
                :branch_method = "no parent" ;
                :branch_time_in_child = 59400. ;
                :branch_time_in_parent = 0. ;
                :contact = "Python Coder (coder@a.b.c.com)" ;
                :creation_date = "2024-06-12T00:48:56Z" ;
                :data_specs_version = "01.00.33" ;
                :experiment = "preindustrial control with interactive ice sheet" ;
                :experiment_id = "piControl-withism" ;
                :external_variables = "areacella" ;
                :forcing_index = 1 ;
                :frequency = "mon" ;
                :further_info_url = "https://furtherinfo.es-doc.org/CMIP6.PCMDI.PCMDI-test-1-0.piControl-withism.none.r3i1p1f1" ;
                :grid = "native atmosphere regular grid (3x4 latxlon)" ;
                :grid_label = "gn" ;
                :history = "2024-06-12T00:48:56Z ;rewrote data to be consistent with ISMIP6 for variable ta found in table Amon.;\n",
                        "Output from archivcl_A1.nce/giccm_03_std_2xCO2_2256." ;
                :initialization_index = 1 ;
                :institution = "Program for Climate Model Diagnosis and Intercomparison, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA" ;
                :institution_id = "PCMDI" ;
                :mip_era = "CMIP6" ;
                :nominal_resolution = "10000 km" ;
                :parent_activity_id = "no parent" ;
                :parent_experiment_id = "no parent" ;
                :parent_mip_era = "no parent" ;
                :parent_source_id = "no parent" ;
                :parent_time_units = "no parent" ;
                :parent_variant_label = "no parent" ;
                :physics_index = 1 ;
                :product = "model-output" ;
                :realization_index = 3 ;
                :realm = "atmos" ;
                :references = "Model described by Koder and Tolkien (J. Geophys. Res., 2001, 576-591).  Also see http://www.GICC.su/giccm/doc/index.html.  The ssp245 simulation is described in Dorkey et al. \'(Clim. Dyn., 2003, 323-357.)\'" ;
                :run_variant = "3rd realization" ;
                :source = "PCMDI-test 1.0 (1989): \n",
                        "aerosol: none\n",
                        "atmos: Earth1.0-gettingHotter (360 x 180 longitude/latitude; 50 levels; top level 0.1 mb)\n",
                        "atmosChem: none\n",
                        "land: Earth1.0\n",
                        "landIce: none\n",
                        "ocean: BlueMarble1.0-warming (360 x 180 longitude/latitude; 50 levels; top grid cell 0-10 m)\n",
                        "ocnBgchem: none\n",
                        "seaIce: Declining1.0-warming (360 x 180 longitude/latitude)" ;
                :source_id = "PCMDI-test-1-0" ;
                :source_type = "AOGCM ISM AER" ;
                :sub_experiment = "none" ;
                :sub_experiment_id = "none" ;
                :table_id = "Amon" ;
                :table_info = "Creation Date:(18 November 2020) MD5:67956a9cc0ef05fb4b373ee8dcc6b433" ;
                :title = "PCMDI-test-1-0 output prepared for CMIP6" ;
                :tracking_id = "hdl:21.14100/640b0704-f139-48a7-bb0d-8dee6938cb4e" ;
                :variable_id = "ta" ;
                :variant_label = "r3i1p1f1" ;
                :license = "CMIP6 model data produced by Lawrence Livermore PCMDI is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file) and at https:///pcmdi.llnl.gov/. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
                :cmor_version = "3.8.0" ;
                :_NCProperties = "version=2,netcdf=4.9.2,hdf5=1.14.3" ;
                :_SuperblockVersion = 2 ;
                :_IsNetcdf4 = 0 ;
                :_Format = "netCDF-4 classic model" ;
data:

 time = 15 ;

 time_bnds =
  0, 30 ;

 plev = 100000, 92500, 85000, 70000, 60000, 50000, 40000, 30000, 25000, 
    20000, 15000, 10000, 7000, 5000, 3000, 2000, 1000, 500, 100 ;

 lat = -45, 45 ;

 lat_bnds =
  -90, 0,
  0, 90 ;

 lon = 90, 270 ;

 lon_bnds =
  0, 180,
  180, 360 ;

 ta =
  280.9897, 280.7897,
  280.6212, 280.9661,
  280.3557, 280.2559,
  280.6284, 280.2772,
  280.7416, 280.5548,
  280.1823, 280.6456,
  280.9416, 280.7458,
  280.9481, 280.3607,
  280.2813, 280.4308,
  280.7332, 280.6132,
  280.033, 280.8078,
  280.4443, 280.9242,
  280.8122, 280.1313,
  280.9221, 280.3488,
  280.0775, 280.8904,
  280.1893, 280.7331,
  280.7415, 280.8027,
  280.7614, 280.2234,
  280.5485, 280.8253,
  280.6991, 280.8816,
  280.5189, 280.2721,
  280.7111, 280.5658,
  280.6756, 280.2242,
  280.0682, 280.3168,
  280.8618, 280.2755,
  280.9171, 280.5865,
  280.2315, 280.3508,
  280.5417, 280.3077,
  280.795, 280.3057,
  280.8251, 280.8671,
  280.3607, 280.0053,
  280.9492, 280.7696,
  280.6571, 280.5878,
  280.3359, 280.3247,
  280.9872, 280.9046,
  280.9969, 280.5405,
  280.6576, 280.4753,
  280.9091, 280.5366 ;
}

@durack1
Copy link
Contributor Author
durack1 commented Jun 12, 2024

It seems to be due to an interaction of "double compression" (i.e., two lossless codecs) with microscopic chunk sizes.

@czender, is this an API problem, i.e., it enables this combination of compression to actually happen, generating a file that may have valid entries, but the libraries have no way to know how to interpret these data to reinflate them?

@czender
Copy link
czender commented Jun 12, 2024

@durack1 This is not an API problem in the sense that netCDF4/HDF5 filters are designed to be "chainable" without limit. The library keeps track of the order the filters are applied and reinflates during reads by applying the inverse filters in reverse order. My impression is that this is a bug in the HDF filter code that is exposed by using multiple filters at the extreme limits of small chunksizes. Remember that chunks are the unit for compression, and must be read in their entirety. So a chunksize of two integers = 8 bytes means that the overhead for the metadata to describe the compression characteristics for a chunk is probably much larger than the chunk itself. Compressing data in units of two integers at a time makes no sense. This may stress the codecs in unforeseen/untested ways. People debate what the optimal chunksize is. (FWIW, NCO avoids compressing variables with chunksizes < 2*filesystem_block_size or 8192 B, whichever is smaller).

@mauzey1
Copy link
Collaborator
mauzey1 commented Jun 12, 2024

Given the "double compression" issue, should we just go with exposing the quantize function in CMOR 3.9 and save the zstandard function for future CMOR development? The quantize doesn't seem to break anything but we should make tests to confirm that.

@durack1
Copy link
Contributor Author
durack1 commented Jun 12, 2024

Given the "double compression" issue, should we just go with exposing the quantize function in CMOR 3.9 and save the zstandard function for future CMOR development? The quantize doesn't seem to break anything but we should make tests to confirm that.

@czender I'd appreciate your take on this, it seems we might have work ahead to get these both working, or do you see a tangible path forward?

@taylor13
Copy link
Collaborator

Yes, let's get Charlie's advice on this before proceeding. Is there a way we can get good compression without running into this problem? perhaps by preventing users from "micro-chunking" the data?

@czender
Copy link
czender commented Jun 12, 2024

First, the quantize functionality seems orthogonal to all of this. No one has reported any issues with it. So it seems like CMOR can continue to test and then implement quantize functionality in its own branch (branch #1).

Second, the results of @mauzey1's tests show the HDF bug is only triggered by variables that are "doubly compressed", but no one knows exactly what triggers it. In fact the bug might occur with any two lossless compressors (e.g., bzip2 and DEFLATE) operating on small chunks and may have nothing to do with Zstandard per se. Since "double compression" is a silly, slow, and non-productive waste of computer resources, there's no reason for CMOR to do it. So my suggestion is to implement the Zstandard functionality in a CMOR branch that has been modified to prevent "double compressing" files. E.g., when CMOR receives a DEFLATE'd file to process, the new behavior should be to fully decompress it before re-compressing it with another codec (e.g., Zstandard), rather than just applying a new codec on top of the old. This would be branch #2.

Orthogonal to that, I suggest creating and testing a CMOR branch that always applies the Shuffle filter prior to lossless compression unless explicitly instructed not to. This would be branch #3.

Branches 1, 2, and 3 are orthogonal and could be released in sequence or combined into one release if the testing goes smoothly.

@mauzey1
Copy link
Collaborator
mauzey1 commented Jun 12, 2024

Looking at the slides again I saw this point.

Lossy "pre-compression" must use netCDF-supported quantization algorithms (BitRound or Granular BitRound) with appropriate CF metadata. Retain full precision for "grid variables" like bounds, coordinates, cell-measures, grid-mapping, formulae.

So we should not be applying quantization and zstandard compression to the coordinates, bounds, z-factors, etc. I ended up adding the quantize and zstandard functions to places in CMOR where deflate was being applied, which included the "grid variables." Maybe restricting quantization and zstandard compression to the dataset variable will lessen the chance of the HDF error from occurring.

I'll still following @czender advice in #725 (comment).

@taylor13
Copy link
Collaborator

nicely spotted, @mauzey1 . Hopefully (and naively coming from me), this will solve the problem.

@czender
Copy link
czender commented Jun 12, 2024

Thank you for reading the slides carefully, @mauzey1. The intent is to prevent the quantization of "grid variables", not to prevent their lossless compression. As you say, preventing their compression would reduce the chance of an HDF error. Moreover many grid variables (lat, lon, ...) for rectangular grids are too small to benefit from compression. However others (e.g., cell_measures = area) can be as large as any 2D geophysical field and thus definitely do benefit from compression. And for unstructured grids, many grid variables (lat, lon, ...) can be as large as the 2D geophysical fields, if not larger (cell bounds arrays for polygonal meshes). So be careful not to throw out the baby with the bathwater :)

@taylor13
Copy link
Collaborator

Yes, @czender is quite right about the size of these grid-related fields. Still, for CMIP, we almost invariably request multiple (100's) of time samples and often multiple (10's) of model levels, which multiples the 2-d spatial dimension by a factor of more than 100 (perhaps 1000's) to give the size of the data array of interest. That means that relative to the data array of interest, the benefit of compressing these "coordinate fields" is small (typically less than 1%). Would we really care if we threw out the baby with the bathwater? (oooh that sounds really mean; don't take it literally).

@czender
Copy link
czender commented Jun 13, 2024

@taylor13 When you put it that way, I agree with you. The "grid variables" may be comparable to a 2D geophysical field in the horizontal dimensions, but they are (so far, at least) constant in time and so usually tiny by comparison to full geophysical timeseries. So chuck out the baby too :)

@mauzey1
Copy link
Collaborator
mauzey1 commented Jul 17, 2024

@czender

When applying zstandard compression, is there a zstandard level that applies no compression similar to how setting the deflate level to 0 applies no compression? I tried using 0 as the zstandard level but that also applies compression. I'm planning to keep deflate at level 1 as the default and not have zstandard enabled at the beginning.

@czender
Copy link
czender commented Jul 18, 2024

@mauzey1 Good to hear you're making progress on this.

When applying zstandard compression, is there a zstandard level that applies no compression similar to how setting the deflate level to 0 applies no compression?

Not to my knowledge

I tried using 0 as the zstandard level but that also applies compression.

Same happened to me

I'm planning to keep deflate at level 1 as the default and not have zstandard enabled at the beginning.

FWIW, somewhere in the Zstandard filter code I read that a good default level for Zstandard is 3, so that's what NCO uses as default.

@mauzey1
Copy link
Collaborator
mauzey1 commented Jul 18, 2024

I was thinking of keeping the current default of deflate level 1 without shuffle. If the user wants zstandard, then they can enable it by setting the zstandard level while also setting deflate level to 0 and enabling shuffle. In my testing, having both zstandard and deflate enabled will adversely affect the compression efficiency.

If the zstandard compresson was on by default, then there should be a way to disable it if you want to use deflate instead. I do not know a way of disabling it once the zstandard level has been set.

@taylor13
Copy link
Collaborator

Not knowing much about this, my instincts say that "no compression" should be the default when files are written. Turning on compression should, by default, invoke it at some level that balances read/write performance against file size for the kinds of files we usually deal with. Of course, the user should be able to set the compression/shuffle parameters to any acceptable values.
But that's probably what you've already implemented.

@czender
Copy link
czender commented Jul 18, 2024

Agreed. Default should be no compression at all. If user specifies a compression level without naming the codec, then NCO assumes the codec is DEFLATE, and applies it with Shuffle. (Shuffle increases compression ratio by ~15% for both DEFLATE and Zstandard). If user specifies a codec but not compression level, then NCO uses default compression level of 1 for DEFLATE and 3 for Zstandard. User must explicitly specify multiple lossless codecs because that's usually a bad idea. When copying files, NCO preserves the codecs in the source file unless the copy command includes an explicit compression option, in which case the input compression codecs are disbanded and only the new codecs are applied. What I intended to convey yesterday is that inside the Zstandard source code it recommends defaults compression level of 3 as offering a good tradeoff between speed and efficiency. Other organizations (e.g., Amazon https://docs.aws.amazon.com/athena/latest/ug/compression-support-zstd-levels.html#:~:text=We%20recommend%20using%20the%20default,speed%20is%20not%20a%20concern.) also default to Zstandard=3, though I'd guess that level=1 would not be too different, a slightly faster compression yielding a slightly larger file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants