Skip to content

ArrowDB is a teaching tool for learning about the power of Arrow and Arrow tooling in the cloud and in the browser. Rust is used as the primary programming language.

License

Notifications You must be signed in to change notification settings

ddimaria/arrow-db

Repository files navigation

ArrowDB

ArrowDB is a teaching tool for learning about the power of Arrow and Arrow tooling in the cloud and in the browser. Rust is used as the primary programming language.

Workspace Members

Crate Description
arrow-db-core The core ArrowDB DB.
arrow-db-server A Tonic server that leverages the Arrow Flight protocol .
arrow-db-client A Rust client for querying the ArrowDB server.
arrow-db-wasm A WebAssembly module for use in the ArrowDB browser.
arrow-db-browser A React app for interacting with the ArrowDB server in the browser.

ArrowDB Fundamentals

ArrowDB is built on top of the Apache Arrow library in Rust. Arrow is a columnar format that is optimized for in-memory data processing and analytics. Full specifications for Arrow can be found at https://arrow.apache.org/docs/format/index.html.

A good analog for database tables in Arrow is a RecordBatch. A RecordBatch is a two-dimensional collection of column-oriented data that is defined by a Schema. The Schema defines the Fields in the RecordBatch, which act as columns in a database. Each Field is a column of data of a single Array type.

Disk Persistence

ArrowDB uses the Parquet format for disk persistence. Parquet is similar to Arrow, but is optimized for disk storage. Parquet files can be read and written using the Parquet crate. Like RecordBatches in Arrow, Parquet files contain Row Groups. Converting Arrow RecordBatches to Parquet files and vice-versa is time and space efficient.

DataFusion

DataFusion is an extensible, parallel query execution engine built on top of Arrow. DataFusion has a DataFrame and SQL API, though the SQL API is used in ArrowDB to support SQL-like queries.

Arrow Flight RPC

Arrow Flight RPC is a protocol for exchanging streams of Arrow RecordBatches over the wire. On the server side in Rust, Arrow Flight is implemented using Tonic, which is a gRPC server framework. gRPC uses Protocol Buffers (protobuf) to define the structure of the data and the service definition.

On the client side in Rust, the FlightServiceClient is used to request and receive Arrow RecordBatches from the server.

WebAssembly

WebAssembly (Wasm) is a bytecode format for the browser, though Wasm is not limited to browsers. It is supported by all modern browsers and can be used to run Rust code in the browser. Since browser Wasm doesn't have access to the file system, ArrowDB just exists in-memory within the browser. The arrow-db-wasm crate contains Rust code for interacting with Arrow data in the browser. The arrow-db-browser app is a React app that uses the arrow-db-wasm crate to manipulate Arrow data in the browser.

About

ArrowDB is a teaching tool for learning about the power of Arrow and Arrow tooling in the cloud and in the browser. Rust is used as the primary programming language.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages