HDF5.jl: Hierarchical data storage for the Julia ecosystem

07/26/2023, 4:00 PM — 4:30 PM UTC
26-100

Abstract:

HDF5.jl is a Julia package for reading and writing data using the Hierarchical Data Format version 5 (HDF5) C library. HDF5 is a flexible, self-describing format suitable for storing complex scientific data, and is used as a container for many other formats. This talk will give an overview of the HDF5 format and give an introduction and examples of basic usage of the HDF5.jl package. We will highlight some recent features and discuss future plans for the package.

Description:

As the name suggests, the HDF5 format allows storing data hierarchical layout of groups and datasets. It is self-describing, meaning that type and dimension metadata is stored in the file, alongside any custom metadata, known as attributes. Its flexibility means that it is widely used in many scientific domains, and as a container format by other libraries, including NetCDF, JLD.jl/JLD2.jl, PyTables and MATLAB’s MAT-files.

HDF5.jl is a Julia package for accessing HDF5 files, using the HDF5 C library maintained by The HDF Group. It provides a simple, high-level interface making it easy to save and load data, as well as a more flexible interface allowing users to take advantage of many of HDF5’s features. Although the HDF5.jl package has been around since 2012, we have recently undertaken some significant changes to improve the modularity and make available newer features.

Some recent feature additions to HDF5.jl package include:

  • Filter pipeline API that supports custom plugins and several advanced compression filter subpackages.
  • Distributed reading and writing with MPI.jl.
  • Reentrant API locks to prevent errors when accessing from multiple threads.
  • Virtual datasets, which support storing data across multiple HDF5 files.
  • Direct access to remote files stored on AWS S3.

Finally, we will discuss future plans:

  • A path for thread-parallel I/O operations with HDF5 files via the raw chunk API to access the byte-level layout of chunked datasets.
  • BinaryBuilder.jl provided binaries across all supported platforms.

Platinum sponsors

JuliaHub

Gold sponsors

ASML

Silver sponsors

Pumas AIQuEra Computing Inc.Relational AIJeffrey Sarnoff

Bronze sponsors

Jolin.ioBeacon BiosignalsMIT CSAILBoeing

Academic partners

NAWA

Local partners

Postmates

Fiscal Sponsor

NumFOCUS