ArrowDB is a teaching tool for learning about the power of Arrow and Arrow tooling in the cloud and in the browser. Rust is used as the primary programming language.
Crate | Description |
---|---|
arrow-db-core | The core ArrowDB DB. |
arrow-db-server | A Tonic server that leverages the Arrow Flight protocol . |
arrow-db-client | A Rust client for querying the ArrowDB server. |
arrow-db-wasm | A WebAssembly module for use in the ArrowDB browser. |
arrow-db-browser | A React app for interacting with the ArrowDB server in the browser. |
ArrowDB is built on top of the Apache Arrow library in Rust. Arrow is a columnar format that is optimized for in-memory data processing and analytics. Full specifications for Arrow can be found at https://arrow.apache.org/docs/format/index.html.
A good analog for database tables in Arrow is a RecordBatch. A RecordBatch is a two-dimensional collection of column-oriented data that is defined by a Schema. The Schema defines the Fields in the RecordBatch, which act as columns in a database. Each Field is a column of data of a single Array type.
ArrowDB uses the Parquet format for disk persistence. Parquet is similar to Arrow, but is optimized for disk storage. Parquet files can be read and written using the Parquet crate. Like RecordBatches in Arrow, Parquet files contain Row Groups. Converting Arrow RecordBatches to Parquet files and vice-versa is time and space efficient.
DataFusion is an extensible, parallel query execution engine built on top of Arrow. DataFusion has a DataFrame and SQL API, though the SQL API is used in ArrowDB to support SQL-like queries.
Arrow Flight RPC is a protocol for exchanging streams of Arrow RecordBatches over the wire. On the server side in Rust, Arrow Flight is implemented using Tonic, which is a gRPC server framework. gRPC uses Protocol Buffers (protobuf) to define the structure of the data and the service definition.
On the client side in Rust, the FlightServiceClient is used to request and receive Arrow RecordBatches from the server.
WebAssembly (Wasm) is a bytecode format for the browser, though Wasm is not limited to browsers. It is supported by all modern browsers and can be used to run Rust code in the browser. Since browser Wasm doesn't have access to the file system, ArrowDB just exists in-memory within the browser. The arrow-db-wasm crate contains Rust code for interacting with Arrow data in the browser. The arrow-db-browser app is a React app that uses the arrow-db-wasm crate to manipulate Arrow data in the browser.