Design- first Open Source Data Management Toolkit. Simplifies data workflows with modular, reproducible solutions
DataJourney demonstrates how organizations can effectively manage and utilize data by harnessing the power of open-source technologies. It's designed to help navigate the complex landscape of data tools, offering a structured approach to building scalable, and reproducible data workflows.
Built on open-source principles, the framework guides users through essential stepsβfrom identifying goals and selecting tools to testing and customising workflows. With its flexible, modular design, DataJourney can be tailored to individual needs, making it an invaluable toolkit for data professionals.
Built with additive, subtractive capabilities glued with open source. Each layer has a certain strength of communication inbuilt
- PO (Base): Static home(s) to keep it together
(GitHub)
- P1 (Tooling): Tooling, strings
(Powered by open source)
- P2 (Maintenance + Monitoring): Env, automations
(Pixi + GHA)
- P3 (Abstraction): Layer(s), CLI/task manager for users to interact with
(Pixi)
{β¨= Experimental, β = Implemented}
Status | Workflow Description |
---|---|
β | Python Packaging framework design principles |
β | GitHub actions configured |
β | Vale.sh configured at PR level |
β | Pre-commit hooks configured for code linting/formatting |
β¨ | Hello world LLM design example based on LangChain |
β | Environment management via pixi |
β | Reading data from online sources using intake |
β | Sample pipeline built using Dagster |
β | Building Dashboard using holoviews + panel |
β | Exploratory data analysis (EDA) using mito |
β | Web UI build on Flask |
β | Web UI re-done and expanded with FastHTML |
β | Leverage AI models to analyse data GitHub AI models Beta |
- Clone DJ
[email protected]:sayantikabanik/DataJourney.git
- Generate & add
GITHUB_TOKEN
, instructions here- Added requirement to run the LLM based workflows
- Switch directory
cd DataJourney
- Download pixi : prefix.dev
- Activate env:
pixi shell
- Install DJ framework locally
pixi run DJ_package
- List all the tasks:
pixi task list
- Execute a task from the list:
pixi run <TASK>
- Execute a task with verbosity enabled:
pixi run -v <TASK>
Task Name | Description |
---|---|
GIT_TOKEN_CHECK |
Verifies the availability and validity of the Git authentication token. |
DJ_package |
Prepares and builds the Python package for the DataJourney project. |
DJ_pre_commit |
Runs pre-commit hooks to ensure code quality and adherence to standards. |
DJ_dagster |
Sets up and runs a Dagster workflow for orchestration in the project. |
DJ_fasthtml_app |
Executes a FastAPI-based HTML application. |
DJ_flask_app |
Configures and runs a Flask-based application for data services. |
DJ_mito_app |
Launches the Mito application for interactive data analysis in notebooks. |
DJ_panel_app |
Executes a Panel dashboard app for data visualization and analytics. |
DJ_llm_analysis |
Performs analysis using large language models (LLMs) on project data. |
DJ_hello_world_langchain |
Sets up a basic LangChain app as a "Hello World" example for LLMs. |
DJ_sync_dataset_trees |
Downloads and synchronizes the trees.csv dataset into the project structure. |
Let me know if you'd like refinements or if there's anything you'd like to add! π
Just like the name suggests, pre-commit-hooks are designed to format the code based on PEP standards before committing. More details
pixi run DJ_pre_commit
pixi run DJ_llm_analysis
pixi run DJ_dagster
pixi run DJ_panel_app
NOTE: The dashboard generated is exported into HTML format and saved as stock_price_twilio_dashboard
To explore further visit trymito.io
pixi run DJ_mito_app
# Run FastHTML app
pixi run DJ_fasthtml_app