Goal: Ease usage of HDF5 compression filters from the Python programming language with h5py:
hdf5plugin packages a set of HDF5 compression filters (namely: blosc, bitshuffle, lz4, FCIDECOMP, ZFP, Zstandard) and makes them usable from the Python programming language through h5py.
Presenter: Thomas VINCENT
LEAPS-INNOV WP7 Meeting, October 11, 2021
from h5glance import H5Glance # Browsing HDF5 files
H5Glance("data.h5")
import h5py # Pythonic HDF5 wrapper: https://docs.h5py.org/
h5file = h5py.File("data.h5", mode="r") # Open HDF5 file in read mode
data = h5file["/data"][()] # Access HDF5 dataset "/data"
plt.imshow(data); plt.colorbar() # Display data
<matplotlib.colorbar.Colorbar at 0x1135755f8>
data = h5file["/compressed_data_blosc"][()] # Access compressed dataset
--------------------------------------------------------------------------- OSError Traceback (most recent call last) <ipython-input-4-07c82b2002f5> in <module> ----> 1 data = h5file["/compressed_data_blosc"][()] # Access compressed dataset h5py/_objects.pyx in h5py._objects.with_phil.wrapper() h5py/_objects.pyx in h5py._objects.with_phil.wrapper() ~/venv/py37env/lib/python3.7/site-packages/h5py/_hl/dataset.py in __getitem__(self, args, new_dtype) 760 mspace = h5s.create_simple(selection.mshape) 761 fspace = selection.id --> 762 self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl) 763 764 # Patch up the output for NumPy h5py/_objects.pyx in h5py._objects.with_phil.wrapper() h5py/_objects.pyx in h5py._objects.with_phil.wrapper() h5py/h5d.pyx in h5py.h5d.DatasetID.read() h5py/_proxy.pyx in h5py._proxy.dset_rw() OSError: Can't read data (can't open directory: /usr/local/hdf5/lib/plugin)
hdf5plugin
usage¶To enable reading compressed datasets not supported by libHDF5
and h5py
:
Install hdf5plugin & import it.
%%bash
pip3 install hdf5plugin
Or: conda install -c conda-forge hdf5plugin
import hdf5plugin
data = h5file["/compressed_data_blosc"][()] # Access datset
plt.imshow(data); plt.colorbar() # Display data
<matplotlib.colorbar.Colorbar at 0x115d40828>
h5file.close() # Close the HDF5 file
When writing datasets with h5py
, compression can be specified with: h5py.Group.create_dataset
# Create a dataset with h5py without compression
h5file = h5py.File("new_file_uncompressed.h5", mode="w")
h5file.create_dataset("/data", data=data)
h5file.close()
# Create a compressed dataset
h5file = h5py.File("new_file_blosc_bitshuffle_lz4.h5", mode="w")
h5file.create_dataset(
"/compressed_data",
data=data,
compression=32001, # blosc HDF5 filter identifier
compression_opts=(0, 0, 0, 0, 5, 2, 1) # options: level, shuffle, compression
)
h5file.close()
hdf5plugin
provides some helpers to ease dealing with compression filter and options:
h5file = h5py.File("new_file_blosc_bitshuffle_lz4.h5", mode="w")
h5file.create_dataset(
"/compressed_data",
data=data,
**hdf5plugin.Blosc(
cname='lz4',
clevel=5,
shuffle=hdf5plugin.Blosc.BITSHUFFLE),
)
h5file.close()
hdf5plugin.Blosc?
H5Glance("new_file_blosc_bitshuffle_lz4.h5")
h5file = h5py.File("new_file_blosc_bitshuffle_lz4.h5", mode="r")
plt.imshow(h5file["/compressed_data"][()]); plt.colorbar()
h5file.close()