Skip to content

Commit

Permalink
chore: Add initial files and configurations for qurious project
Browse files Browse the repository at this point in the history
  • Loading branch information
holicc committed Jul 24, 2024
0 parents commit 0f38298
Show file tree
Hide file tree
Showing 96 changed files with 11,492 additions and 0 deletions.
12 changes: 12 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# To get started with Dependabot version updates, you'll need to specify which
# package ecosystems to update and where the package manifests are located.
# Please see the documentation for more information:
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
# https://containers.dev/guide/dependabot

version: 2
updates:
- package-ecosystem: "devcontainers"
directory: "/"
schedule:
interval: weekly
44 changes: 44 additions & 0 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
name: Rust

on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]

env:
CARGO_TERM_COLOR: always

jobs:
linux-test:
name: cargo test
runs-on: ubuntu-latest
container:
image: amd64/rust
services:
postgres:
image: postgres
env:
POSTGRES_PASSWORD: postgres
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v4
- name: Run tests
run: cargo test --lib --tests --features postgres
env:
POSTGRES_HOST: postgres
POSTGRES_PORT: 5432

linux-build:
runs-on: ubuntu-latest
container:
image: amd64/rust
steps:
- uses: actions/checkout@v4
- name: Build
run: cargo build --release

19 changes: 19 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Generated by Cargo
# will have compiled files and executables
debug/
target/

# Remove Cargo.lock from gitignore if creating an executable, leave it for libraries
# More information here https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html
Cargo.lock

# These are backup files generated by rustfmt
**/*.rs.bk

# MSVC Windows builds of rustc generate these, which store debugging information
*.pdb


# Added by cargo

/target
8 changes: 8 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"rust-analyzer.linkedProjects": [
"./Cargo.toml"
],
"rust-analyzer.cargo.features": [
"postgresql"
],
}
29 changes: 29 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
[workspace]
members = ["qurious", "sqlparser"]
resolver = "2"


[workspace.package]
authors = ["Longshan Lu"]
description = "Qurious"
homepage = "https://github.com/holicc/qurious"
license = "CC BY-SA 4.0"
readme = "README.md"
repository = "https://github.com/holicc/qurious"
rust-version = "1.79"
version = "1.0.0"
edition = "2021"

[workspace.dependencies]
sqlparser = { path = "sqlparser" }
parquet = "52.0.0"
arrow = "52.0.0"
url = "2.5.0"
tokio = { version = "1.37.0", features = ["full"] }
async-trait = "0.1.80"
tokio-stream = "0.1.15"
log = "0.4.21"


[workspace.lints.rust]
unused_imports = "deny"
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
check:
cargo check --all-features
25 changes: 25 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Qurious

[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![Build Status][actions-badge]][actions-url]

[actions-badge]: https://github.com/holicc/qurious/actions/workflows/rust.yml/badge.svg
[actions-url]: https://github.com/holicc/qurious/actions?query=branch%3Amain

## Description

> Yet another SQL query engine. inspired by Apache DataFusion.
Qurious is a high-performance, in-memory query engine written in Rust and built on top of the Apache Arrow framework. It offers a powerful and familiar SQL interface, similar to Apache DataFusion, allowing you to efficiently analyze large datasets stored in various formats. By leveraging the strengths of Arrow, Qurious provides efficient memory management and seamless data exchange with other Arrow-based tools.

## Development Status

It's important to emphasize that Qurious is in its early development phase. While the team is actively working on it, the project is not yet ready for production use. The features and functionalities outlined above are still under development and subject to change as progress is made.

## Key Features

- SQL Compatibility: Qurious supports a wide range of SQL functionalities, making it easy to learn and use for those familiar with SQL.
- High Performance: Benefiting from Rust's efficiency and memory safety, Qurious provides fast query execution times on large datasets, enabling you to gain insights quickly.
- In-Memory Processing: Data is processed within the system's memory for blazing-fast performance, making it ideal for real-time analytics or scenarios demanding rapid response times.
- Supported Data Formats: Qurious can read and write data from diverse formats, including CSV, Parquet, JSON, Avro, and potentially more (depending on your implementation).
- Integration with Arrow ecosystem: Qurious seamlessly integrates with other Arrow-based tools and libraries, enabling smooth data exchange and workflow optimization.
14 changes: 14 additions & 0 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
version: "3.8"
services:
postgresql:
image: postgres
container_name: postgres_db
restart: always
environment:
POSTGRES_DB: qurious
POSTGRES_USER: root
POSTGRES_PASSWORD: root
volumes:
- ./tests/testdata/db/pg/migration.sql:/docker-entrypoint-initdb.d/init.sql
ports:
- 5433:5432
26 changes: 26 additions & 0 deletions qurious/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
[package]
name = "qurious"
version = "0.1.0"
edition = "2021"


[features]
default = []
postgresql = ["dep:pgwire", "dep:chrono"]

[dependencies]
sqlparser = { workspace = true }
parquet = { workspace = true }
arrow = { workspace = true }
url = { workspace = true }
tokio = { workspace = true }
async-trait = { workspace = true }
tokio-stream = { workspace = true }
log = { workspace = true }
# postgres
pgwire = { version = "0.23.0", optional = true }
chrono = { version = "0.4.38", optional = true }


[dev-dependencies]
arrow = { version = "52.0.0", features = ["prettyprint", "test_utils"] }
32 changes: 32 additions & 0 deletions qurious/src/common/join_type.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
use std::fmt::Display;

#[derive(Debug, PartialEq, Eq, Clone, Copy)]
pub enum JoinType {
Left,
Right,
Inner,
Full,
}

impl Display for JoinType {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
JoinType::Left => write!(f, "Left Join"),
JoinType::Right => write!(f, "Right Join"),
JoinType::Inner => write!(f, "Inner Join"),
JoinType::Full => write!(f, "Full Join"),
}
}
}

impl From<sqlparser::ast::JoinType> for JoinType {
fn from(value: sqlparser::ast::JoinType) -> Self {
match value {
sqlparser::ast::JoinType::Inner => JoinType::Inner,
sqlparser::ast::JoinType::Left => JoinType::Left,
sqlparser::ast::JoinType::Right => JoinType::Right,
sqlparser::ast::JoinType::Full => JoinType::Full,
_ => unimplemented!(),
}
}
}
3 changes: 3 additions & 0 deletions qurious/src/common/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
pub mod join_type;
pub mod table_relation;
pub mod table_schema;
86 changes: 86 additions & 0 deletions qurious/src/common/table_relation.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
use std::{fmt::Display, sync::Arc};

#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd, Ord)]
pub enum TableRelation {
/// An unqualified table reference, e.g. "table"
Bare {
/// The table name
table: Arc<str>,
},
/// A partially resolved table reference, e.g. "schema.table"
Partial {
/// The schema containing the table
schema: Arc<str>,
/// The table name
table: Arc<str>,
},
/// A fully resolved table reference, e.g. "catalog.schema.table"
Full {
/// The catalog (aka database) containing the table
catalog: Arc<str>,
/// The schema containing the table
schema: Arc<str>,
/// The table name
table: Arc<str>,
},
}

impl TableRelation {
fn parse_str(a: &str) -> Self {
let mut idents = a.split('.').into_iter().collect::<Vec<&str>>();

match idents.len() {
1 => TableRelation::Bare {
table: idents.remove(0).into(),
},
2 => TableRelation::Partial {
schema: idents.remove(0).into(),
table: idents.remove(0).into(),
},
3 => TableRelation::Full {
catalog: idents.remove(0).into(),
schema: idents.remove(0).into(),
table: idents.remove(0).into(),
},
_ => TableRelation::Bare { table: a.into() },
}
}

/// Return the fully qualified name of the table
pub fn to_quanlify_name(&self) -> String {
match self {
TableRelation::Bare { table } => table.to_string(),
TableRelation::Partial { schema, table } => {
format!("{}.{}", schema, table)
}
TableRelation::Full { catalog, schema, table } => {
format!("{}.{}.{}", catalog, schema, table)
}
}
}
}

impl Display for TableRelation {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
self.to_quanlify_name().fmt(f)
}
}

impl From<String> for TableRelation {
fn from(value: String) -> Self {
TableRelation::parse_str(&value)
}
}

impl From<&str> for TableRelation {
fn from(value: &str) -> Self {
TableRelation::parse_str(value)
}
}

#[cfg(test)]
mod tests {

#[test]
fn test_table_function() {}
}
11 changes: 11 additions & 0 deletions qurious/src/common/table_schema.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
use std::sync::Arc;

use crate::common::table_relation::TableRelation;
use arrow::datatypes::Schema;

pub type TableSchemaRef = Arc<TableSchema>;

pub struct TableSchema {
schema: Schema,
relation: TableRelation,
}
59 changes: 59 additions & 0 deletions qurious/src/dataframe/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
use std::sync::Arc;

use arrow::{datatypes::SchemaRef, record_batch::RecordBatch};

use crate::{
error::Result,
logical::{
expr::{self, LogicalExpr},
plan::{Aggregate, Filter, LogicalPlan, Projection},
},
planner::QueryPlanner,
};

#[derive(Debug)]
pub struct DataFrame {
planner: Arc<dyn QueryPlanner>,
plan: LogicalPlan,
}

impl DataFrame {
pub fn new(plan: LogicalPlan, planner: Arc<dyn QueryPlanner>) -> Self {
Self { plan, planner }
}

pub fn plan(&self) -> LogicalPlan {
self.plan.clone()
}

pub fn schema(&self) -> SchemaRef {
self.plan.schema()
}

pub fn collect(self) -> Result<Vec<RecordBatch>> {
self.planner.create_physical_plan(&self.plan)?.execute()
}
}

impl DataFrame {
pub fn project(self, columns: Vec<LogicalExpr>) -> Result<Self> {
Projection::try_new(self.plan, columns).map(|plan| Self {
planner: self.planner,
plan: LogicalPlan::Projection(plan),
})
}

pub fn filter(self, predicate: LogicalExpr) -> Result<Self> {
Ok(Self {
planner: self.planner,
plan: LogicalPlan::Filter(Filter::try_new(self.plan.clone(), predicate)?),
})
}

pub fn aggregate(self, group_by: Vec<LogicalExpr>, aggr_expr: Vec<expr::AggregateExpr>) -> Result<Self> {
Aggregate::try_new(self.plan.clone(), group_by, aggr_expr).map(|a| Self {
planner: self.planner,
plan: LogicalPlan::Aggregate(a),
})
}
}
Loading

0 comments on commit 0f38298

Please sign in to comment.