OM-File-Format library

The Open-Meteo OM-File format is designed for efficient storage and distribution of multi-dimensional environmental data. By chunking, compressing, and indexing the data, OM-Files enable cloud-native random reads while minimizing file sizes. The format supports hierarchical data structures similar to NetCDF or HDF5.

This library implements the format in C, with a high-level Swift abstraction integrated directly into the Open-Meteo weather API. Future bindings for Python, TypeScript, and Rust are planned.

Note: This library is in a highly experimental stage. While Open-Meteo has used the format for years, this standalone library was initiated in October 2024 to provide Python bindings. We aim to provide a robust Python library to access the Open-Meteo weather database provided on S3 through an AWS open-data sponsorship.

Features:

Chunked, compressed multi-dimensional arrays
High-speed integer compression: Fast compression speed at high compression ratios
Lossless and lossy compression: Adjustable accuracy via scale factors to further reduce data size
Optimized for cloud-native random IO access: Supports IO merging and splitting
Sequential file writing: Enables streaming write to cloud storage; metadata is stored at the file’s end
Sans-IO C implementation: Designed for async support and concurrency in higher-level libraries

Core Principles:

Chunked Data Storage: OM-Files partition large data arrays into individually compressed chunks, with a lookup table tracking chunk positions. This allows reading and decompressing only the required chunks—ideal for use cases like meteorological datasets, where users often query specific regions rather than global data.
Optimized for Meteorological Use Cases: Example: In weather reanalysis (e.g., Copernicus ERA5-Land), global datasets at 0.1° spatial resolution can reach massive scales. A single timestep with 3600 x 1800 pixels (~25 MB using 32-bit floats) grows to 211.5 GB for one year of hourly data (8760 hours). Over decades, and across thousands of variables, datasets easily reach petabyte scales. Traditional GRIB files, while efficient for compression, require decompressing the entire file to access specific subsets. OM-Files, on the other hand, allow direct access to localized data (e.g., a single country or city) by leveraging small chunk sizes (e.g., 3 x 3 x 120).
- High-Speed Data Access: OM-Files minimize data transfer and decompression overhead, enabling extremely fast reads while maintaining strong compression ratios based on FastPFOR with SIMD instructions for compression rates in the GB/s range. This powers the Open-Meteo weather API to deliver forecasts in sub-millisecond speeds and enables large-scale data analysis without requiring users to download hundreds of gigabytes of GRIB files.
Improved Compression Efficiency: Chunking exploits spatial and temporal data correlations to enhance compression. Weather data, for instance, shows gradual changes across locations and time. Optimal chunking dimensions (compressing 1,000–2,000 values per chunk with a last dimension >100) strike a balance between compression efficiency and performance. Too many chunks reduce both.

ToDo:

Document Swift functions
Document C functions
Support for string attributes and string-arrays
Build Python library
Examples how to use Python FSSPEC with cache to access OM-Files on S3
Build web-interface to make the entire Open-Meteo weather database accessible with automatic Python code generation

Swift Library Interface

Swift code can be found in ./Swift with tests in ./Tests

TODO: Document functions + example

C Library Interface

The C code is available in /c

TODO document C functions

Data Hierarchy Model:

The file trailer contains the position of the root Variable
Each Variable has a datatype and payload. E.g. Int16 has the number as 2-byte payload. An array stores the look-up-table position and array dimension information. The actual compressed array data, is stored at the beginning of the file.
Each Variable has a name
Each Variable has 0...N variables -> Variables resemble a key-value store where each value can have N children.

A Variable be be of different types:

None: Does not contain any value. Useful to define a group
Scalar or types Int8, Int16, Int32, Int64, Float, Double, etc
Array of type Int8, Int16, etc with dimensions, chunks and compression type information
String to be implemented
String Array to be implemented

Examples

The following examples show how data with attribute can be encoded into an OM-File format

Example 1: Plain array inside an OM-File:

Root: Name="temperature_2m" Type=Float32-Array Dimensions=[720,1400,24] Chunks=[1,50,24]

Example 2: Array with attributes

Root: Name="temperature_2m" Type=Float32-Array Dimensions=[720,1400,24] Chunks=[1,50,24]
|- Name="dimension_names" Type=String-Array Dimensions=[3]
|- Name="long_name" Type=String Value="Temperature 2 metres above ground"
|- Name="unit" Type=String Value="Celsius"
|- Name="height" Type=Int32 Value=2

Example 3: Multiple Arrays with attributes

Root: Type=None
|- Name="temperature_2m" Type=Float32-Array Dimensions=[720,1400,24] Chunks=[1,50,24]
  |- Name="dimension_names" Type=String-Array Dimensions=[3]
  |- Name="long_name" Type=String Value="Temperature 2 metres above ground"
  |- Name="unit" Type=String Value="Celsius"
  |- Name="height" Type=Int32 Value=2
|- Name="relative_humidity_2m" Type=Float32-Array Dimensions=[720,1400,24] Chunks=[1,50,24]
  |- Name="dimension_names" Type=String-Array Dimensions=[3]
  |- Name="long_name" Type=String Value="Relative Humidity 2 metres above ground"
  |- Name="unit" Type=String Value="Percentage"
  |- Name="height" Type=Int32 Value=2

Model

classDiagram
    Variable <|-- Variable
    Variable --|> Int8
    Variable --|> Int16
    Variable --|>String
    Variable --|> Array
    Trailer --|> Variable
    Variable : +String_name
    Variable : +Variable[]_children
    Variable : +Enum_data_type
    Variable : +Enum_compression_type
    Variable: +number_of_childen()
    Variable: +get_child(int n)
    Variable: +get_name()
    class Trailer {
        +version
        +root_variable
    }
    class Int8{
      +Int8 value
      +read()
    }
    class Int16{
      +Int16 value
      +read()
    }
    class String{
      +String_value
      +read()
    }
    class Array{
        +Int64[]_dimensions
        +Int64[]_chunks
      +Int64_look_up_table_offset
      +Int64_look_up_table_size
      +read(offset:Int64[],count:Int64[])
    }

Legacy Binary Format:

Int16: magic number "OM"
Int8: version
Int8: compression type with filter
Float32: scalefactor
Int64: dim0 dim (slow)
Int64: dim0 dim1 (fast)
Int64: chunk dim0
Int64: chunk dim1
Array of 64-bit Integer: Offset lookup table
Blob: Data for each chunk, offset but the lookup table

New Binary Format:

3 byte: header (magic number "OM" + version)
Blob: Compressed data and lookup table LUT
Blob: Binary encoded meta data
24 byte: Trailer with address to root variable

Binary representation:

File header with magic number and version
File trailer with offsets and size of the root variable
Variable has attributes: date type (8bit), compression type (8bit), size_of_name (16bit), count_of_attributes (32bit)
Depending on data type followed by payload for a given data type
Followed by the name as string, and for each attribute the offset and size
Typically all compressed data is in the beginning of the file, followed by all meta data and attributes (streaming write without ever seeking back!)

Header message:

Byte 1	Byte 2	Byte 3	Byte 4	Byte 5	Byte 6	Byte 7	Byte 8
Magic number "OM"		Version

Trailer message:

Byte 1	Byte 3	Byte 4	Byte 5
Magic number "OM"	Version	Reserved	Reserved
Size of Root Variable
Offset of Root Variable

Variable message:

Byte 1	Byte 2	Byte 3	Byte 5
Data Type	Compression Type	Size of name	Number of Children
Size of Value / LUT (only arrays and strings)
Offset of Value / LUT (only arrays)
Number of Dimensions (only arrays)
Scale Factor (float, only arrays)			Add Offset (float, only arrays)
N * Size of Child
N * Offset of Child
N * Dimension Length (only arrays)
N * Chunk Dimension Length (only arrays)
Bytes of value (scalar, string, not arrays)
Byte of name

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
Swift		Swift
Tests/OmFileFormatTests		Tests/OmFileFormatTests
c		c
rust/om-file-format-sys		rust/om-file-format-sys
.gitignore		.gitignore
.releaserc		.releaserc
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Cross.toml		Cross.toml
LICENSE		LICENSE
Package.swift		Package.swift
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OM-File-Format library

Features:

Core Principles:

ToDo:

Swift Library Interface

C Library Interface

Data Hierarchy Model:

Examples

Model

About

Releases 1

Packages

Contributors 2

Languages

License

open-meteo/om-file-format

Folders and files

Latest commit

History

Repository files navigation

OM-File-Format library

Features:

Core Principles:

ToDo:

Swift Library Interface

C Library Interface

Data Hierarchy Model:

Examples

Model

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages