Skip to content

Commit

Permalink
refactor code generator's chat structure (microsoft#78)
Browse files Browse the repository at this point in the history
Moving original chat content from "executor" to "user". Now, the user is
responsible to report a feedback to the CG. One consequence is that we
need to add an extra post from the user if the end of the conversation
is "ci->user". Because otherwise, there will be no execution result.

This can only happen in examples or during chat history summarization. 

misc:
- changed cg's prompt file name
- fixed typos in prompts
- fixed old uts and add new uts
  • Loading branch information
liqul authored Dec 19, 2023
1 parent aa9b218 commit 01429e5
Show file tree
Hide file tree
Showing 17 changed files with 272 additions and 116 deletions.
2 changes: 1 addition & 1 deletion docs/example.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ rounds:
A code interpreter example tells LLMs how to generate code or orchestrate plugins to perform a specific task.
The task is from the planner. Before constructing the code interpreter example, we strongly encourage you to
read the [code generator prompt](../taskweaver/code_interpreter/code_generator/code_generator_json_prompt.yaml).
read the [code generator prompt](../taskweaver/code_interpreter/code_generator/code_generator_prompt.yaml).
The following is an example of a code interpreter example which contains 2 posts.
Each post contains a message, a sender, a receiver, and a list of attachments.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,12 @@ rounds:
send_from: Planner
send_to: CodeInterpreter
attachment_list: []
- message: Greetings! {ROLE_NAME} can understand the user request and generate syntactically correct python code to complete tasks and can utilize pre-defined plugins in the form of python functions to achieve tasks.
- message: Greetings! I can understand the user request and generate syntactically correct python code to complete tasks and can utilize pre-defined plugins in the form of python functions to achieve tasks.
send_from: CodeInterpreter
send_to: Planner
attachment_list:
- type: text
content: Greetings! {ROLE_NAME} can understand the user request and generate syntactically correct python code to complete tasks and can utilize pre-defined plugins in the form of python functions to achieve tasks.
content: Greetings! I can understand the user request and generate syntactically correct python code to complete tasks and can utilize pre-defined plugins in the form of python functions to achieve tasks.
- type: verification
content: NONE
- type: code_error
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ rounds:
send_to: Planner
attachment_list:
- type: thought
content: "{ROLE_NAME} understands that the execution of the previous round has fell."
content: "{ROLE_NAME} understands that the execution of the previous round has failed."
- type: thought
content: "{ROLE_NAME} understands that the file /abc/def.txt does not exist and will not attempt to read it again."
- type: text
Expand Down
2 changes: 2 additions & 0 deletions project/plugins/klarna_search.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ enabled: true
required: false
description: >-
Search and compare prices from thousands of online shops. Only available in the US.
This plugin only takes user requests when searching for merchandise.
If not clear, confirm with the user if they want to search for merchandise from Klarna.
parameters:
- name: query
Expand Down
3 changes: 3 additions & 0 deletions project/plugins/sql_pull_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ def __call__(self, query: str):
{schema}
Question: {question}
Please only write the sql query.
Do not add any comments or extra text.
Do not wrap the query in quotes or ```sql.
SQL Query:"""
prompt = ChatPromptTemplate.from_template(template)

Expand Down
5 changes: 3 additions & 2 deletions project/plugins/sql_pull_data.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@ name: sql_pull_data
enabled: true
required: false
description: >-
Pull data from a SQL database. This plugin takes user requests when obtaining data from database is explicitly mentioned.
Otherwise, it is not sure if the user wants to pull data from database or not.
Pull data from a SQL database.
This plugin takes user requests when obtaining data from database is explicitly mentioned.
Otherwise, confirm with the user if they want to pull data from this database.
The data from this database can only used for anomaly detection.
parameters:
Expand Down
95 changes: 74 additions & 21 deletions taskweaver/code_interpreter/code_generator/code_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ def _configure(self) -> None:
"prompt_file_path",
os.path.join(
os.path.dirname(os.path.abspath(__file__)),
"code_generator_json_prompt.yaml",
"code_generator_prompt.yaml",
),
)
self.example_base_path = self._get_path(
Expand Down Expand Up @@ -147,15 +147,15 @@ def compose_prompt(
self.examples = self.load_examples(plugin_only=self.plugin_only)
for i, example in enumerate(self.examples):
chat_history.extend(
self.compose_conversation(example.rounds, example.plugins),
self.compose_conversation(example.rounds, example.plugins, add_requirements=False),
)

summary = None
if self.config.prompt_compression:
summary, rounds = self.round_compressor.compress_rounds(
rounds,
rounds_formatter=lambda _rounds: str(
self.compose_conversation(_rounds, plugins),
self.compose_conversation(_rounds, plugins, add_requirements=False),
),
use_back_up_engine=True,
prompt_template=self.compression_template,
Expand All @@ -171,27 +171,36 @@ def compose_prompt(
)
return chat_history

def format_attachment(self, attachment: Attachment):
if attachment.type == AttachmentType.thought:
return attachment.content.format(ROLE_NAME=self.role_name)
else:
return attachment.content

def compose_conversation(
self,
rounds: List[Round],
plugins: List[PluginEntry],
add_requirements: bool = False,
summary: Optional[str] = None,
) -> List[ChatMessageType]:
def format_attachment(attachment: Attachment):
if attachment.type == AttachmentType.thought:
return attachment.content.format(ROLE_NAME=self.role_name)
else:
return attachment.content

chat_history: List[ChatMessageType] = []
ignored_types = [
AttachmentType.revise_message,
AttachmentType.verification,
AttachmentType.code_error,
AttachmentType.execution_status,
AttachmentType.execution_result,
]

is_first_post = True
last_post: Post = None
for round_index, conversation_round in enumerate(rounds):
for post_index, post in enumerate(conversation_round.post_list):
# compose user query
user_message = ""
assistant_message = ""

is_final_post = round_index == len(rounds) - 1 and post_index == len(conversation_round.post_list) - 1
if is_first_post:
user_message = (
self.conversation_head_template.format(
Expand All @@ -209,37 +218,53 @@ def format_attachment(attachment: Attachment):
enrichment = ""
if plan is not None:
enrichment = (
f"To complete this request:{user_query}\n\n"
f"To complete this request: {user_query}\n\n"
f"I have drawn up a plan: \n{plan}\n\n"
f"Please proceed with this step of this plan:"
)

user_feedback = "None"
if last_post is not None and last_post.send_from == "CodeInterpreter":
user_feedback = format_code_feedback(last_post)

user_message += self.user_message_head_template.format(
FEEDBACK=user_feedback,
MESSAGE=f"{enrichment}{post.message}",
)
elif post.send_from == "CodeInterpreter" and post.send_to == "CodeInterpreter":
elif post.send_from == post.send_to == "CodeInterpreter":
# for code correction
user_message += self.user_message_head_template.format(
FEEDBACK=format_code_feedback(post),
MESSAGE=f"{post.get_attachment(AttachmentType.revise_message)[0]}",
)

assistant_message = self.post_translator.post_to_raw_text(
post=post,
content_formatter=format_attachment,
content_formatter=self.format_attachment,
if_format_message=False,
if_format_send_to=False,
ignore_types=[AttachmentType.revise_message],
ignored_types=ignored_types,
)
elif post.send_from == "CodeInterpreter" and post.send_to == "Planner":
if is_final_post:
# This user message is added to make the conversation complete
# It is used to make sure the last assistant message has a feedback
# This is only used for examples or context summarization
user_message += self.user_message_head_template.format(
FEEDBACK=format_code_feedback(post),
MESSAGE="This is the feedback.",
)

assistant_message = self.post_translator.post_to_raw_text(
post=post,
content_formatter=format_attachment,
content_formatter=self.format_attachment,
if_format_message=False,
if_format_send_to=False,
ignore_types=[AttachmentType.revise_message],
ignored_types=ignored_types,
)
else:
raise ValueError(f"Invalid post: {post}")
last_post = post

if len(assistant_message) > 0:
chat_history.append(
Expand All @@ -250,11 +275,9 @@ def format_attachment(attachment: Attachment):
)
if len(user_message) > 0:
# add requirements to the last user message
if add_requirements and post_index == len(conversation_round.post_list) - 1:
if is_final_post and add_requirements:
user_message += "\n" + self.query_requirements_template.format(
PLUGIN_ONLY_PROMPT=self.compose_verification_requirements(
plugins,
),
CODE_GENERATION_REQUIREMENTS=self.compose_verification_requirements(plugins),
ROLE_NAME=self.role_name,
)
chat_history.append(
Expand Down Expand Up @@ -300,7 +323,7 @@ def reply(

prompt = self.compose_prompt(rounds, self.plugin_pool)

def early_stop(_type: AttachmentType, value: str):
def early_stop(_type: AttachmentType, value: str) -> bool:
if _type in [AttachmentType.text, AttachmentType.python, AttachmentType.sample]:
return True
else:
Expand Down Expand Up @@ -377,3 +400,33 @@ def format_output_revision_message() -> str:
"Don't surround the JSON with ```json and ```, just send the JSON object directly.\n"
"Please try again."
)


def format_code_feedback(post: Post) -> str:
feedback = ""
verification_status = ""
execution_status = ""
for attachment in post.attachment_list:
if attachment.type == AttachmentType.verification and attachment.content == "CORRECT":
feedback += "## Verification\nI have verified that your code is CORRECT.\n"
verification_status = "CORRECT"
elif attachment.type == AttachmentType.verification and attachment.content == "NONE":
feedback += "## Verification\nNo code verification.\n"
verification_status = "NONE"
elif attachment.type == AttachmentType.verification and attachment.content == "INCORRECT":
feedback += "## Verification\nYour code is INCORRECT with the following error:\n"
verification_status = "INCORRECT"
elif attachment.type == AttachmentType.code_error and verification_status == "INCORRECT":
feedback += f"{attachment.content}\n"
elif attachment.type == AttachmentType.execution_status and attachment.content == "NONE":
feedback += "## Execution\nNo code execution.\n"
execution_status = "NONE"
elif attachment.type == AttachmentType.execution_status and attachment.content == "SUCCESS":
feedback += "## Execution\nYour code has been executed successfully with the following result:\n"
execution_status = "SUCCESS"
elif attachment.type == AttachmentType.execution_status and attachment.content == "FAILURE":
feedback += "## Execution\nYour code has failed to execute with the following error:\n"
execution_status = "FAILURE"
elif attachment.type == AttachmentType.execution_result and execution_status != "NONE":
feedback += f"{attachment.content}\n"
return feedback
Original file line number Diff line number Diff line change
Expand Up @@ -4,46 +4,38 @@ content: |-
- Each conversation starts with "==============================\n## Conversation Start"
- Each conversation has multiple rounds, each round starts with "-----------------------------"
- Each conversation has a context summary and definitions of plugin functions, both could be none.
- Each conversation is between the {ROLE_NAME} and the User.
## On {ROLE_NAME}'s profile and general capabilities:
- {ROLE_NAME} can understand the user request and generate syntactically correct python code to complete tasks.
- {ROLE_NAME} can utilize pre-defined plugins of python functions to achieve tasks.
- {ROLE_NAME} can utilize pre-defined python functions (a.k.a plugins) to achieve tasks.
- {ROLE_NAME} is prohibited to define functions that have been defined as plugins.
- {ROLE_NAME} is prohibited to use plugins defined in previous conversations.
- {ROLE_NAME} can only refer to variables in the generated code from previous successful rounds in the current Conversation, but should not refer to any information from failed rounds, rounds that have not been executed, or previous Conversations.
- {ROLE_NAME} should import other libraries if needed; if the library is not pre-installed, {ROLE_NAME} should install it in {EXECUTOR_NAME} as long as the user does not forbid it.
- {ROLE_NAME} verifies the correctness of the generated code. If the code is incorrect, {ROLE_NAME} will generate a verification error message.
- {ROLE_NAME} should import other libraries if needed; if the library is not pre-installed, {ROLE_NAME} should install it (with !pip) as long as the user does not forbid it.
- {ROLE_NAME} must respond to the User's feedback with a new code that addresses the feedback.
## On {EXECUTOR_NAME}'s profile and general capabilities:
- {EXECUTOR_NAME} executes the generated python code from {ROLE_NAME}.
- {EXECUTOR_NAME} is backed by a stateful python Jupyter kernel.
- {EXECUTOR_NAME} has three possible status: SUCCESS, FAILURE, and NONE.
- SUCCESS means the code has been executed successfully.
- FAILURE means the code has been executed unsuccessfully due to exceptions or errors.
- NONE means no code has not been executed.
## On User's profile and general capabilities:
- Upon receiving code from {ROLE_NAME}, the User will verify the correctness of the generated code by {ROLE_NAME} before executing it.
- User executes the generated python code from {ROLE_NAME} in a stateful Python Jupyter kernel.
- If any error occurs during the verification or execution, the User will provide feedback to the {ROLE_NAME}.
## On response format:
- The response is a JSON list of dictionaries, each dictionary represents a reply that has a key named 'type' and a key named 'content'.
- The JSON list contains replies from {ROLE_NAME} and {EXECUTOR_NAME}.
## On {ROLE_NAME}'s response format:
- The response is a JSON array of dictionaries, each dictionary has a key named 'type' and a key named 'content', i.e., {{"response": [{{"type": "thought", "content": "..." }}, ...]}}
- {ROLE_NAME} generates the reply to the user with 'type' that must be one of the following:
- "thought": the thoughts on the intermediate steps
- "sample": textual descriptions including the sample code
- "python": the code that can be executed by {EXECUTOR_NAME}; comments must be added calling functions from the pre-defined plugins, including the description of the function and the parameters.
- "text": the direct response in text without code
- "verification": the verification status on correctness of the generated code that can be CORRECT, INCORRECT, or NONE
- "code_error": the verification error message if the generated code is INCORRECT
- The JSON list can include multiple thought replies, but it can have only one of the following: sample, python, or text, exclusively.
- {EXECUTOR_NAME} generates replies to the user with 'type' that must be one of the following:
- "execution_status": the execution status of the code generated by {ROLE_NAME}, could be SUCCESS, FAILURE, or NONE
- "execution_result": the code execution result by {EXECUTOR_NAME} including the output and the error message
- The value of 'content' is a string that contains the actual content of the reply in markdown syntax.
- The "response" array can include multiple thought replies, but it can have only one of sample, python, or text, exclusively.
- The value of "content" is a string that contains the actual content and {ROLE_NAME} must be very careful about escaping the special characters (e.g., '\', '/', and '"') in the string for JSON format.
conversation_head: |-
==============================
## Conversation Start
### Context Summary
The context summary of the previous rounds and a list of variables that {ROLE_NAME} can refer to:
The context summary of previous rounds and the variables that {ROLE_NAME} can refer to:
{SUMMARY}
### Plugin Functions
Expand All @@ -52,13 +44,18 @@ conversation_head: |-
user_message_head: |-
-----------------------------
- User: {MESSAGE}
# Feedback of the code in the last round (None if no feedback):
{FEEDBACK}
# Request from the User in this round:
{MESSAGE}
requirements: |-
Please follow the instructions below to complete the task:
- {ROLE_NAME} can refer to intermediate variables in the generated code from previous successful rounds and the context summary in the current Conversation,
- {ROLE_NAME} should not refer to any information from failed rounds, rounds that have not been executed, or previous Conversations.
- {ROLE_NAME} put all the result variables in the last line of the code.
- {ROLE_NAME} should leave "verification", "code_error", "execution_status", and "execution_result" empty in the response.
- {ROLE_NAME} must not import the plugins and otherwise the code will be failed to execute.
{PLUGIN_ONLY_PROMPT}
{CODE_GENERATION_REQUIREMENTS}
Loading

0 comments on commit 01429e5

Please sign in to comment.