Skip to content

Commit

Permalink
added paper, modified read me, cleaned the analysis file
Browse files Browse the repository at this point in the history
  • Loading branch information
chelcie de almeida authored and chelcie de almeida committed Aug 27, 2023
1 parent ecb9420 commit b0a040e
Show file tree
Hide file tree
Showing 4 changed files with 37 additions and 12 deletions.
32 changes: 24 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,49 @@
# JinnyDB

JinnyDB is a Natural Language to Databases (NLIDB) application powered by LangChain and GPT 3.5 Turbo that allows users to prompt their database using natural language to retrieve data from their databases
JinnyDB is a Natural Language to Databases (NLIDB) application powered by LangChain and GPT 3.5 Turbo model that allows users to prompt their database using natural language to retrieve data from their databases

This experiment is based on my research paper which you can read [here](https://github.com/ChelcieDeAlmeida/JinnyDB/tree/main/research_paper).

## Table of Contents

- [Installation](#installation)
- [Usage](#usage)
- [License](#license)
- [TO DO](#todo)

## Installation

Once you download this repo, run the following command from the project directory.

```pip install -r requirements.txt```

## Usage
I used .env files for my environment set up but you can do as you please.

This program is available and open to all users
You will need an account with OpenAI if you plan on using the GPT-3.5-Turbo-16K model but any suitable model should work fine.

## License
## Usage

Currently does not have any license
This experiment used two databases through postgres:
- AdventureWorks [Download/Installation Here](https://github.com/lorint/AdventureWorks-for-Postgres)
- Bookings [Download/Installation Here] (https://postgrespro.com/docs/postgrespro/10/demodb-bookings)

## Contributing
Note that I modified these databases to have it work for my purpose, you may choose to do the same or work on your own databases
but should you decide that you'd like to do work on this exact project then you may find the queries I used in the db folder of this repo.

No Active Contribution
Some of the changes I made to the data:
- AdventureWorks
- Migrated all the tables into a single schema called 'Prod'
- Dropped the views that were built with the db
- Bookings
- I converted the coordinates column in the airports_data table from point to text (my use case at the moment doesn't call for it)
- separated the coordinates column to longitude and latitude
- This isn't advisable if you intend to use them, Point is a better data type for coordinates (PostGIS is even better if you're interested in spatial data)


Below are a list of tasks, nonexhaustive that I intend on completing but feel free to participate if you'd like

## TO DO:
- [ ] Build Backend
- [ ] Build frontend on React
- [ ] Add template logic for dialect/user input/database object definitions
- [ ] Integrate intermediate steps
- [ ] Add logic to re-run when """OutputParserException: Could not parse LLM output: `I now know the final answer.`"""
Expand Down
13 changes: 10 additions & 3 deletions lang_gpt_analysis.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "code",
"execution_count": 91,
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -19,7 +19,7 @@
},
{
"cell_type": "code",
"execution_count": 92,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -588,6 +588,13 @@
"agent_executor.run(\"How many distinct flights departed from Moscow?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 51,
Expand Down Expand Up @@ -1573,7 +1580,7 @@
}
],
"source": [
"#TC12 : FAIL (checked the airports view which was a good idea and less complicated than checking airport_data table\n",
"#TC12 : FAIL (checked the airports view which was a good idea and less complicated than checking airport_data table which is what I did\n",
"# but couldn't perform a nested query with aggregate function)\n",
"agent_executor.run(\"Which cities have more than 1 airport? Return the airport code, airport name, and city in your response.\")"
]
Expand Down
4 changes: 3 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,6 @@ langchain==0.0.242
langchain-experimental==0.0.8
langsmith==0.0.14
psycopg2==2.9.6
python-dotenv==1.0.0
python-dotenv==1.0.0
pandas==1.5.3
SQLAlchemy==2.0.19
Binary file not shown.

0 comments on commit b0a040e

Please sign in to comment.