-
Notifications
You must be signed in to change notification settings - Fork 707
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
10 changed files
with
13,644 additions
and
3,341 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
# Prompt Compression | ||
|
||
After chatting for a few rounds, the chat history can become quite long, especially when we have code and execution results in it. | ||
This can cause the problem of exceeding the context window of the LLMs. | ||
To solve the problem, one way is to summarize the chat history a few rounds ago, | ||
and only keep the latest rounds of the chat history. | ||
|
||
Another way is to use a vector database to store the chat history entries, and only retrieve the last few rounds of the relevant | ||
part given the current user request. However, in TaskWeaver, code is also part of the chat history. | ||
It is not an option to skip some intermediate code and execution results in order to correctly | ||
generate the code for the current user request. Therefore, we choose the first way to solve the problem. | ||
|
||
The following figure shows the idea of chat history summarization where the chat history is divided into two parts: | ||
- Rounds to compress: this part is summarized and only the summary is kept in the chat history. If the context_summary | ||
already exists, a new summary is generated based on the previous summary adding the rounds to be summarized. | ||
- Rounds to retain: this part is kept in the chat history without summarization. | ||
|
||
```mermaid | ||
flowchart LR | ||
ConversationSummary-->Round1 | ||
subgraph Rounds to compress | ||
Round1-->Round2 | ||
end | ||
subgraph Rounds to retain | ||
Round2-->Round3-->Round4-->Round5 | ||
end | ||
``` | ||
Imagine that, at the beginning, the ConversationSummary is empty. | ||
Once the chat history reaches the `rounds_to_compress` (default 2) rounds plus `rounds_to_retain` (default 3) rounds, | ||
the ConversationSummary is generated based on the `rounds_to_compress` rounds and the `rounds_to_retain` rounds are kept in the chat history. | ||
After that, there will be only `rounds_to_retain` rounds in the chat history. | ||
The next time the chat history reaches the `rounds_to_compress` rounds plus `rounds_to_retain` rounds, | ||
the ConversationSummary is generated based on the `rounds_to_compress` rounds and the previous ConversationSummary. | ||
We use these two parameters to control the frequency of the chat history summarization. | ||
|
||
An example of the chat history summarization in the Code Generator is shown below: | ||
|
||
```json | ||
{ | ||
"ConversationSummary": "The user requested the generation of 100 random numbers, which was successfully executed. Then, the user asked to show the top 5 largest numbers from the generated random numbers. The assistant provided a code snippet to sort the generated random numbers in descending order and select the top 5 largest numbers, which was also successfully executed. After that, the user requested to plot the distribution of the 100 numbers, which was successfully executed. The user then asked to count the frequency of numbers in each bin of the histogram and identify the bin with the most numbers for the 0.1 bin width, which was also successfully executed.", | ||
"Variables": [ | ||
{ | ||
"name": "random_numbers_100", | ||
"type": "numpy array", | ||
"description": "An array containing 100 random numbers generated using np.random.rand()" | ||
}, | ||
{ | ||
"name": "top_5_largest", | ||
"type": "numpy array", | ||
"description": "An array containing the top 5 largest numbers from the generated random numbers" | ||
} | ||
] | ||
} | ||
``` | ||
The JSON object has two fields: | ||
- ConversationSummary: the summary of the chat history. | ||
- Variables: the variables in the chat history that could be used in the current user request. | ||
|
||
The chat history summary of the Planner has only the ConversationSummary field. | ||
|
||
The actual code generated in the summarized rounds is ignored and only the variables are kept in the summary | ||
so that the LLM can still refer the these variables in future code generation. | ||
|
||
One thing to note is that chat history summarization requires call the LLM which incurs additional latency and cost. | ||
The prompts for chat history summarization could be found for [planner](../taskweaver/planner/compression_prompt.yaml) | ||
and [code generator](../taskweaver/code_interpreter/code_generator/compression_prompt.yaml). | ||
|
||
## Configurations | ||
As explained above, there are two parameters in controlling the chat history summarization: | ||
`round_compressor.rounds_to_compress` (default 2) and `round_compressor.rounds_to_retain` (default 3). | ||
To enable the chat history summarization, you need to set `planner.prompt_compression` | ||
and `code_generator.prompt_compression` to `true`. | ||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,35 +1,40 @@ | ||
# Configuration file | ||
# Configuration File | ||
The configuration file is located at `project/taskweaver_config.json`. | ||
You can edit this file to configure TaskWeaver. | ||
The configuration file is in JSON format. So for boolean values, use `true` or `false` instead of `True` or `False`. | ||
For null values, use `null` instead of `None` or `"null"`. All other values should be strings in double quotes. | ||
The following table lists the parameters in the configuration file: | ||
|
||
| Parameter | Description | Default Value | | ||
|-----------------------------------------------|----------------------------------------------------------------------------------|----------------------------------------------------------------------------------------| | ||
| `llm.model` | The model name used by the language model. | gpt-4 | | ||
| `llm.backup_model` | The model name used for self-correction purposes. | `null` | | ||
| `llm.api_base` | The base URL of the OpenAI API. | `https://api.openai.com/v1` | | ||
| `llm.api_key` | The API key of the OpenAI API. | `null` | | ||
| `llm.api_type` | The type of the OpenAI API, could be `openai` or `azure`. | `openai` | | ||
| `llm.api_version` | The version of the OpenAI API. | `2023-07-01-preview` | | ||
| Parameter | Description | Default Value | | ||
|-----------------------------------------------|----------------------------------------------------------------------------|----------------------------------------------------------------------------------------| | ||
| `llm.model` | The model name used by the language model. | gpt-4 | | ||
| `llm.backup_model` | The model name used for self-correction purposes. | `null` | | ||
| `llm.api_base` | The base URL of the OpenAI API. | `https://api.openai.com/v1` | | ||
| `llm.api_key` | The API key of the OpenAI API. | `null` | | ||
| `llm.api_type` | The type of the OpenAI API, could be `openai` or `azure`. | `openai` | | ||
| `llm.api_version` | The version of the OpenAI API. | `2023-07-01-preview` | | ||
| `llm.response_format` | The response format of the OpenAI API, could be `json_object`, `text` or `null`. | `json_object` | | ||
| `code_interpreter.code_verification_on` | Whether to enable code verification. | `false` | | ||
| `code_interpreter.plugin_only` | Whether to turn on the plugin only mode. | `false` | | ||
| `code_interpreter.allowed_modules` | The list of allowed modules to import in code generation. | `"pandas", "matplotlib", "numpy", "sklearn", "scipy", "seaborn", "datetime", "typing"` | | ||
| `logging.log_file` | The name of the log file. | `taskweaver.log` | | ||
| `logging.log_folder` | The folder to store the log file. | `logs` | | ||
| `plugin.base_path` | The folder to store plugins. | `${AppBaseDir}/plugins` | | ||
| `planner.example_base_path` | The folder to store planner examples. | `${AppBaseDir}/planner_examples` | | ||
| `planner.prompt_compression` | Whether to compress the chat history for planner. | `false` | | ||
| `planner.skip_planning` | Whether to skip LLM planning process and enable the default plan | `false` | | ||
| `code_generator.example_base_path` | The folder to store code interpreter examples. | `${AppBaseDir}/codeinterpreter_examples` | | ||
| `code_generator.prompt_compression` | Whether to compress the chat history for code interpreter. | `false` | | ||
| `code_generator.enable_auto_plugin_selection` | Whether to enable auto plugin selection. | `false` | | ||
| `code_generator.auto_plugin_selection_topk` | The number of auto selected plugins in each round. | `3` | | ||
| `code_interpreter.code_verification_on` | Whether to enable code verification. | `false` | | ||
| `code_interpreter.plugin_only` | Whether to turn on the plugin only mode. | `false` | | ||
| `code_interpreter.allowed_modules` | The list of allowed modules to import in code generation. | `"pandas", "matplotlib", "numpy", "sklearn", "scipy", "seaborn", "datetime", "typing"` | | ||
| `logging.log_file` | The name of the log file. | `taskweaver.log` | | ||
| `logging.log_folder` | The folder to store the log file. | `logs` | | ||
| `plugin.base_path` | The folder to store plugins. | `${AppBaseDir}/plugins` | | ||
| `planner.example_base_path` | The folder to store planner examples. | `${AppBaseDir}/planner_examples` | | ||
| `planner.prompt_compression` | Whether to compress the chat history for planner. | `false` | | ||
| `planner.skip_planning` | Whether to skip LLM planning process and enable the default plan | `false` | | ||
| `code_generator.example_base_path` | The folder to store code interpreter examples. | `${AppBaseDir}/codeinterpreter_examples` | | ||
| `code_generator.prompt_compression` | Whether to compress the chat history for code interpreter. | `false` | | ||
| `code_generator.enable_auto_plugin_selection` | Whether to enable auto plugin selection. | `false` | | ||
| `code_generator.auto_plugin_selection_topk` | The number of auto selected plugins in each round. | `3` | | ||
| `session.max_internal_chat_round_num` | The maximum number of internal chat rounds between Planner and Code Interpreter. | `10` | | ||
| `session.code_interpreter_only` | Allow users to directly communicate with the Code Interpreter. | `false` | | ||
| `session.code_interpreter_only` | Allow users to directly communicate with the Code Interpreter. | `false` | | ||
|`round_compressor.rounds_to_compress` | The number of rounds to compress. | `2` | | ||
|`round_compressor.rounds_to_retain` | The number of rounds to retain. | `3` | | ||
|
||
|
||
> 💡 $\{AppBaseDir\} is the project directory. | ||
> 💡 Up to 11/30/2023, the `json_object` and `text` options of `llm.response_format` is only supported by the OpenAI models later than 1106. If you are using an older version of OpenAI model, you need to set the `llm.response_format` to `null`. | ||
> 💡 Read [this](compression.md) for more information for `planner.prompt_compression` and `code_generator.prompt_compression`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# running pytests | ||
# Running pytest | ||
|
||
## quickstart | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.