Skip to content

Commit

Permalink
feat: adds query translation
Browse files Browse the repository at this point in the history
  • Loading branch information
micheleriva committed Nov 7, 2024
1 parent bbdb5cd commit 50ab51a
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 19 deletions.
10 changes: 8 additions & 2 deletions llm/src/bin/query_translator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,18 @@ async fn main() -> Result<()> {
"tags": "enum[]"
}"#;

let result = qt
let result1 = qt
.translate(q1.to_string(), Some(schema.to_string()))
.await?
.unwrap();

dbg!(result);
println!("{}", result1);

let q2 = "What are the best headphones under $200 for listening to hi-fi music?";

let result2 = qt.translate(q2.to_string(), None).await?.unwrap();

print!("{}", result2);

Ok(())
}
47 changes: 30 additions & 17 deletions llm/src/query_translator/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ use std::collections::HashMap;
use std::string::ToString;
use std::sync::Arc;
use textwrap::dedent;
use utils::parse_json_safely;

pub struct QueryTranslator {
pub llm: Arc<Model>,
Expand All @@ -32,34 +33,40 @@ impl QueryTranslator {
Let me show you what you need to do with some examples.
Example:
- Query: \`"What are the red wines that cost less than 20 dollars?"\`
- Schema: \`{ name: 'string', content: 'string', price: 'number', tags: 'enum[]' }\`
- Generated query: \`{ "term": "", "where": { "tags": { "containsAll": ["red", "wine"] }, "price": { "lt": 20 } } }\`
- Query: `"What are the red wines that cost less than 20 dollars?"`
- Schema: `{ name: 'string', content: 'string', price: 'number', tags: 'enum[]' }`
- Generated query: `{ "term": "", "where": { "tags": { "containsAll": ["red", "wine"] }, "price": { "lt": 20 } } }`
Another example:
- Query: \`"Show me 5 prosecco wines good for aperitif"\`
- Schema: \`{ name: 'string', content: 'string', price: 'number', tags: 'enum[]' }\`
- Generated query: \`{ "term": "prosecco aperitif", "limit": 5 }\`
- Query: `"Show me 5 prosecco wines good for aperitif"`
- Schema: `{ name: 'string', content: 'string', price: 'number', tags: 'enum[]' }`
- Generated query: `{ "term": "prosecco aperitif", "limit": 5 }`
One example without schema:
- Query: `"What are the best headphones under $200 for listening to hi-fi music?"`
- Schema: There is no schema for this query.
- Generated query: `{ "term": "best headphones hi-fi under $200" }`
One last example:
- Query: \`"Show me some wine reviews with a score greater than 4.5 and less than 5.0."\`
- Schema: \`{ title: 'string', content: 'string', reviews: { score: 'number', text: 'string' } }]\`
- Generated query: \`{ "term": "", "where": { "reviews.score": { "between": [4.5, 5.0] } } }\`
- Query: `"Show me some wine reviews with a score greater than 4.5 and less than 5.0."`
- Schema: `{ title: 'string', content: 'string', reviews: { score: 'number', text: 'string' } }]`
- Generated query: `{ "term": "", "where": { "reviews.score": { "between": [4.5, 5.0] } } }`
The rules to generate the query are:
- Never use the "embedding" field.
- Every query has a "term" field that is a string. It represents the full-text search terms. Can be empty (will match all documents).
- You can use a "where" field that is an object. It represents the filters to apply to the documents. Its keys and values depend on the schema of the database:
- If the field is a "string", you should not use operators. Example: \`{ "where": { "title": "champagne" } }\`.
- If the field is a "number", you can use the following operators: "gt", "gte", "lt", "lte", "eq", "between". Example: \`{ "where": { "price": { "between": [20, 100] } } }\`. Another example: \`{ "where": { "price": { "lt": 20 } } }\`.
- If the field is an "enum", you can use the following operators: "eq", "in", "nin". Example: \`{ "where": { "tags": { "containsAll": ["red", "wine"] } } }\`.
- If the field is an "string[]", it's gonna be just like the "string" field, but you can use an array of values. Example: \`{ "where": { "title": ["champagne", "montagne"] } }\`.
- If the field is a "boolean", you can use the following operators: "eq". Example: \`{ "where": { "isAvailable": true } }\`. Another example: \`{ "where": { "isAvailable": false } }\`.
- If the field is a "enum[]", you can use the following operators: "containsAll". Example: \`{ "where": { "tags": { "containsAll": ["red", "wine"] } } }\`.
- Nested properties are supported. Just translate them into dot notation. Example: \`{ "where": { "author.name": "John" } }\`.
- If the field is a "string", you should not use operators. Example: `{ "where": { "title": "champagne" } }`.
- If the field is a "number", you can use the following operators: "gt", "gte", "lt", "lte", "eq", "between". Example: `{ "where": { "price": { "between": [20, 100] } } }`. Another example: `{ "where": { "price": { "lt": 20 } } }`.
- If the field is an "enum", you can use the following operators: "eq", "in", "nin". Example: `{ "where": { "tags": { "containsAll": ["red", "wine"] } } }`.
- If the field is an "string[]", it's gonna be just like the "string" field, but you can use an array of values. Example: `{ "where": { "title": ["champagne", "montagne"] } }`.
- If the field is a "boolean", you can use the following operators: "eq". Example: `{ "where": { "isAvailable": true } }`. Another example: `{ "where": { "isAvailable": false } }`.
- If the field is a "enum[]", you can use the following operators: "containsAll". Example: `{ "where": { "tags": { "containsAll": ["red", "wine"] } } }`.
- Nested properties are supported. Just translate them into dot notation. Example: `{ "where": { "author.name": "John" } }`.
- Array of numbers are not supported.
- Array of booleans are not supported.
- If there is no schema, just use the "term" field.
Reply with the generated query in a valid JSON format only. Nothing else.
"#,
Expand Down Expand Up @@ -157,7 +164,13 @@ impl QueryTranslator {
let response = self.llm.send_chat_request(messages).await?;
let message = &response.choices[0].message;

Ok(message.clone().content)
match message.clone().content {
Some(content) => {
let json_value = parse_json_safely(content)?;
Ok(Some(json_value.to_string()))
}
None => Ok(None),
}
}

fn generate_user_prompt(query: String, schema: Option<String>) -> String {
Expand Down

0 comments on commit 50ab51a

Please sign in to comment.