Skip to content

C++20 idiomatic APIs for the Apache Arrow Columnar Format

License

Notifications You must be signed in to change notification settings

man-group/sparrow

Repository files navigation

sparrow

GHA Linux GHA OSX GHA Windows GHA Docs

C++20 idiomatic APIs for the Apache Arrow Columnar Format

Introduction

sparrow is an implementation of the Apache Arrow Columnar format in C++. It provides array structures with idiomatic APIs and convenient conversions from and to the C interface.

sparrow requires a modern C++ compiler supporting C++20.

Installation

Package managers

We provide a package for the mamba (or conda) package manager:

mamba install -c conda-forge sparrow

Install from sources

sparrow has a few dependencies that you can install in a mamba environment:

mamba env create -f environment-dev.yml
mamba activate sparrow

You can then create a build directory, and build the project and install it with cmake:

mkdir build
cd build
cmake .. \
    -DCMAKE_BUILD_TYPE=Debug \
    -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \
    -DBUILD_EXAMPLES=ON \
    -DBUILD_TESTS=ON \
    -BUILD_DOCS=ON \
    ..
make install

Usage

Requirements

Compilers:

  • Clang 18 or higher
  • GCC 12 or higher
  • Apple Clang 16 or higher
  • MSVC 19.41 or higher

Initialize data with sparrow and extract C data structures

#include "sparrow/sparrow.hpp"
namespace sp = sparrow;

sp::primitive_array<int> ar = { 1, 3, 5, 7, 9 };
auto [arrow_array, arrow_schema] = sp::extract_arrow_structures(std::move(ar));
// Use arrow_array and arrow_schema as you need (serialization, passing it to
// a third party library)
// ...
// You are responsible for releasing the structure in the end
arrow_array.release(&arrow_array);
arrow_schema.release(&arrow_schema);

Initialize data with sparrow and use C data structures

#include "sparrow/sparrow.hpp"
namespace sp = sparrow;

sp::primitive_array<int> ar = { 1, 3, 5, 7, 9 };
// Caution: get_arrow_structures returns pointers, not values
auto [arrow_array, arrow_schema] = sp::get_arrow_structures(std::move(ar));
// Use arrow_array and arrow_schema as you need (serialization, passing it to
// a third party library)
// ...
// do NOT release the C structures in the end, the "ar" variable will do it for you

Read data from somewhere and pass it to sparrow

#include "sparrow/sparrow.hpp"
#include "thrid-party-lib.hpp"
namespace sp = sparrow;
namespace tpl = third_party_library;

ArrowArray array;
ArrowSchema schema;
tpl::read_arrow_structures(&array, &schema);

sp::array ar(&array, &schema);
// Use ar as you need
// ...
// You are responsible for releasing the structure in the end
arrow_array.release(&arrow_array);
arrow_schema.release(&arrow_schema);

Read data from somewhere and move it into sparrow

#include "sparrow/sparrow.hpp"
#include "thrid-party-lib.hpp"
namespace sp = sparrow;
namespace tpl = third_party_library;

ArrowArray array;
ArrowSchema schema;
tpl::read_arrow_structures(&array, &schema);

sp::array ar(std::move(array), std::move(schema));
// Use ar as you need
// ...
// do NOT release the C structures in the end, the "ar" variable will do it for you

Documentation

The documentation (currently being written) can be found at https://man-group.github.io/sparrow/index.html

Acknowledgements

This development has been funded as part of a collaboration between ArcticDB, Bloomberg, and QuantStack.

License

This software is licensed under the Apache License 2.0. See the LICENSE file for details.