refactor code generator's chat structure (microsoft#78)

Moving original chat content from "executor" to "user". Now, the user is responsible to report a feedback to the CG. One consequence is that we need to add an extra post from the user if the end of the conversation is "ci->user". Because otherwise, there will be no execution result. This can only happen in examples or during chat history summarization. misc: - changed cg's prompt file name - fixed typos in prompts - fixed old uts and add new uts
kisejin · Dec 19, 2023 · 01429e5 · 01429e5
1 parent aa9b218
commit 01429e5
Show file tree

Hide file tree

Showing 17 changed files with 272 additions and 116 deletions.
diff --git a/docs/example.md b/docs/example.md
@@ -79,7 +79,7 @@ rounds:
 
 A code interpreter example tells LLMs how to generate code or orchestrate plugins to perform a specific task.
 The task is from the planner. Before constructing the code interpreter example, we strongly encourage you to
-read the [code generator prompt](../taskweaver/code_interpreter/code_generator/code_generator_json_prompt.yaml). 
+read the [code generator prompt](../taskweaver/code_interpreter/code_generator/code_generator_prompt.yaml). 
 
 The following is an example of a code interpreter example which contains 2 posts.
 Each post contains a message, a sender, a receiver, and a list of attachments.

diff --git a/project/codeinterpreter_examples/example1-codeinterpreter.yaml b/project/codeinterpreter_examples/example1-codeinterpreter.yaml
@@ -8,12 +8,12 @@ rounds:
         send_from: Planner
         send_to: CodeInterpreter
         attachment_list: []
-      - message: Greetings! {ROLE_NAME} can understand the user request and generate syntactically correct python code to complete tasks and can utilize pre-defined plugins in the form of python functions to achieve tasks.
+      - message: Greetings! I can understand the user request and generate syntactically correct python code to complete tasks and can utilize pre-defined plugins in the form of python functions to achieve tasks.
         send_from: CodeInterpreter
         send_to: Planner
         attachment_list:
           - type: text
-            content: Greetings! {ROLE_NAME} can understand the user request and generate syntactically correct python code to complete tasks and can utilize pre-defined plugins in the form of python functions to achieve tasks.
+            content: Greetings! I can understand the user request and generate syntactically correct python code to complete tasks and can utilize pre-defined plugins in the form of python functions to achieve tasks.
           - type: verification
             content: NONE
           - type: code_error

diff --git a/project/codeinterpreter_examples/example2-codeinterpreter.yaml b/project/codeinterpreter_examples/example2-codeinterpreter.yaml
@@ -41,7 +41,7 @@ rounds:
         send_to: Planner
         attachment_list:
           - type: thought
-            content: "{ROLE_NAME} understands that the execution of the previous round has fell."
+            content: "{ROLE_NAME} understands that the execution of the previous round has failed."
           - type: thought
             content: "{ROLE_NAME} understands that the file /abc/def.txt does not exist and will not attempt to read it again."
           - type: text

diff --git a/project/plugins/klarna_search.yaml b/project/plugins/klarna_search.yaml
@@ -3,6 +3,8 @@ enabled: true
 required: false
 description: >-
   Search and compare prices from thousands of online shops. Only available in the US.
+  This plugin only takes user requests when searching for merchandise.
+  If not clear, confirm with the user if they want to search for merchandise from Klarna.
 
 parameters:
   - name: query

diff --git a/project/plugins/sql_pull_data.py b/project/plugins/sql_pull_data.py
@@ -37,6 +37,9 @@ def __call__(self, query: str):
             {schema}
 
             Question: {question}
+            Please only write the sql query.
+            Do not add any comments or extra text.
+            Do not wrap the query in quotes or ```sql.
             SQL Query:"""
         prompt = ChatPromptTemplate.from_template(template)
 

diff --git a/project/plugins/sql_pull_data.yaml b/project/plugins/sql_pull_data.yaml
@@ -2,8 +2,9 @@ name: sql_pull_data
 enabled: true
 required: false
 description: >-
-  Pull data from a SQL database. This plugin takes user requests when obtaining data from database is explicitly mentioned.
-  Otherwise, it is not sure if the user wants to pull data from database or not.
+  Pull data from a SQL database. 
+  This plugin takes user requests when obtaining data from database is explicitly mentioned.
+  Otherwise, confirm with the user if they want to pull data from this database.
   The data from this database can only used for anomaly detection.
 
 parameters:

diff --git a/taskweaver/code_interpreter/code_generator/code_generator.py b/taskweaver/code_interpreter/code_generator/code_generator.py
@@ -27,7 +27,7 @@ def _configure(self) -> None:
             "prompt_file_path",
             os.path.join(
                 os.path.dirname(os.path.abspath(__file__)),
-                "code_generator_json_prompt.yaml",
+                "code_generator_prompt.yaml",
             ),
         )
         self.example_base_path = self._get_path(
@@ -147,15 +147,15 @@ def compose_prompt(
             self.examples = self.load_examples(plugin_only=self.plugin_only)
         for i, example in enumerate(self.examples):
             chat_history.extend(
-                self.compose_conversation(example.rounds, example.plugins),
+                self.compose_conversation(example.rounds, example.plugins, add_requirements=False),
             )
 
         summary = None
         if self.config.prompt_compression:
             summary, rounds = self.round_compressor.compress_rounds(
                 rounds,
                 rounds_formatter=lambda _rounds: str(
-                    self.compose_conversation(_rounds, plugins),
+                    self.compose_conversation(_rounds, plugins, add_requirements=False),
                 ),
                 use_back_up_engine=True,
                 prompt_template=self.compression_template,
@@ -171,27 +171,36 @@ def compose_prompt(
         )
         return chat_history
 
+    def format_attachment(self, attachment: Attachment):
+        if attachment.type == AttachmentType.thought:
+            return attachment.content.format(ROLE_NAME=self.role_name)
+        else:
+            return attachment.content
+
     def compose_conversation(
         self,
         rounds: List[Round],
         plugins: List[PluginEntry],
         add_requirements: bool = False,
         summary: Optional[str] = None,
     ) -> List[ChatMessageType]:
-        def format_attachment(attachment: Attachment):
-            if attachment.type == AttachmentType.thought:
-                return attachment.content.format(ROLE_NAME=self.role_name)
-            else:
-                return attachment.content
-
         chat_history: List[ChatMessageType] = []
+        ignored_types = [
+            AttachmentType.revise_message,
+            AttachmentType.verification,
+            AttachmentType.code_error,
+            AttachmentType.execution_status,
+            AttachmentType.execution_result,
+        ]
+
         is_first_post = True
+        last_post: Post = None
         for round_index, conversation_round in enumerate(rounds):
             for post_index, post in enumerate(conversation_round.post_list):
                 # compose user query
                 user_message = ""
                 assistant_message = ""
-
+                is_final_post = round_index == len(rounds) - 1 and post_index == len(conversation_round.post_list) - 1
                 if is_first_post:
                     user_message = (
                         self.conversation_head_template.format(
@@ -209,37 +218,53 @@ def format_attachment(attachment: Attachment):
                     enrichment = ""
                     if plan is not None:
                         enrichment = (
-                            f"To complete this request:{user_query}\n\n"
+                            f"To complete this request: {user_query}\n\n"
                             f"I have drawn up a plan: \n{plan}\n\n"
                             f"Please proceed with this step of this plan:"
                         )
 
+                    user_feedback = "None"
+                    if last_post is not None and last_post.send_from == "CodeInterpreter":
+                        user_feedback = format_code_feedback(last_post)
+
                     user_message += self.user_message_head_template.format(
+                        FEEDBACK=user_feedback,
                         MESSAGE=f"{enrichment}{post.message}",
                     )
-                elif post.send_from == "CodeInterpreter" and post.send_to == "CodeInterpreter":
+                elif post.send_from == post.send_to == "CodeInterpreter":
                     # for code correction
                     user_message += self.user_message_head_template.format(
+                        FEEDBACK=format_code_feedback(post),
                         MESSAGE=f"{post.get_attachment(AttachmentType.revise_message)[0]}",
                     )
 
                     assistant_message = self.post_translator.post_to_raw_text(
                         post=post,
-                        content_formatter=format_attachment,
+                        content_formatter=self.format_attachment,
                         if_format_message=False,
                         if_format_send_to=False,
-                        ignore_types=[AttachmentType.revise_message],
+                        ignored_types=ignored_types,
                     )
                 elif post.send_from == "CodeInterpreter" and post.send_to == "Planner":
+                    if is_final_post:
+                        # This user message is added to make the conversation complete
+                        # It is used to make sure the last assistant message has a feedback
+                        # This is only used for examples or context summarization
+                        user_message += self.user_message_head_template.format(
+                            FEEDBACK=format_code_feedback(post),
+                            MESSAGE="This is the feedback.",
+                        )
+
                     assistant_message = self.post_translator.post_to_raw_text(
                         post=post,
-                        content_formatter=format_attachment,
+                        content_formatter=self.format_attachment,
                         if_format_message=False,
                         if_format_send_to=False,
-                        ignore_types=[AttachmentType.revise_message],
+                        ignored_types=ignored_types,
                     )
                 else:
                     raise ValueError(f"Invalid post: {post}")
+                last_post = post
 
                 if len(assistant_message) > 0:
                     chat_history.append(
@@ -250,11 +275,9 @@ def format_attachment(attachment: Attachment):
                     )
                 if len(user_message) > 0:
                     # add requirements to the last user message
-                    if add_requirements and post_index == len(conversation_round.post_list) - 1:
+                    if is_final_post and add_requirements:
                         user_message += "\n" + self.query_requirements_template.format(
-                            PLUGIN_ONLY_PROMPT=self.compose_verification_requirements(
-                                plugins,
-                            ),
+                            CODE_GENERATION_REQUIREMENTS=self.compose_verification_requirements(plugins),
                             ROLE_NAME=self.role_name,
                         )
                     chat_history.append(
@@ -300,7 +323,7 @@ def reply(
 
         prompt = self.compose_prompt(rounds, self.plugin_pool)
 
-        def early_stop(_type: AttachmentType, value: str):
+        def early_stop(_type: AttachmentType, value: str) -> bool:
             if _type in [AttachmentType.text, AttachmentType.python, AttachmentType.sample]:
                 return True
             else:
@@ -377,3 +400,33 @@ def format_output_revision_message() -> str:
         "Don't surround the JSON with ```json and ```, just send the JSON object directly.\n"
         "Please try again."
     )
+
+
+def format_code_feedback(post: Post) -> str:
+    feedback = ""
+    verification_status = ""
+    execution_status = ""
+    for attachment in post.attachment_list:
+        if attachment.type == AttachmentType.verification and attachment.content == "CORRECT":
+            feedback += "## Verification\nI have verified that your code is CORRECT.\n"
+            verification_status = "CORRECT"
+        elif attachment.type == AttachmentType.verification and attachment.content == "NONE":
+            feedback += "## Verification\nNo code verification.\n"
+            verification_status = "NONE"
+        elif attachment.type == AttachmentType.verification and attachment.content == "INCORRECT":
+            feedback += "## Verification\nYour code is INCORRECT with the following error:\n"
+            verification_status = "INCORRECT"
+        elif attachment.type == AttachmentType.code_error and verification_status == "INCORRECT":
+            feedback += f"{attachment.content}\n"
+        elif attachment.type == AttachmentType.execution_status and attachment.content == "NONE":
+            feedback += "## Execution\nNo code execution.\n"
+            execution_status = "NONE"
+        elif attachment.type == AttachmentType.execution_status and attachment.content == "SUCCESS":
+            feedback += "## Execution\nYour code has been executed successfully with the following result:\n"
+            execution_status = "SUCCESS"
+        elif attachment.type == AttachmentType.execution_status and attachment.content == "FAILURE":
+            feedback += "## Execution\nYour code has failed to execute with the following error:\n"
+            execution_status = "FAILURE"
+        elif attachment.type == AttachmentType.execution_result and execution_status != "NONE":
+            feedback += f"{attachment.content}\n"
+    return feedback
diff --git a/...generator/code_generator_json_prompt.yaml → ...code_generator/code_generator_prompt.yaml b/...generator/code_generator_json_prompt.yaml → ...code_generator/code_generator_prompt.yaml
@@ -4,46 +4,38 @@ content: |-
     - Each conversation starts with "==============================\n## Conversation Start"
     - Each conversation has multiple rounds, each round starts with "-----------------------------"
     - Each conversation has a context summary and definitions of plugin functions, both could be none.
+    - Each conversation is between the {ROLE_NAME} and the User.
     
     ## On {ROLE_NAME}'s profile and general capabilities:
     - {ROLE_NAME} can understand the user request and generate syntactically correct python code to complete tasks.
-    - {ROLE_NAME} can utilize pre-defined plugins of python functions to achieve tasks.
+    - {ROLE_NAME} can utilize pre-defined python functions (a.k.a plugins) to achieve tasks.
     - {ROLE_NAME} is prohibited to define functions that have been defined as plugins.
     - {ROLE_NAME} is prohibited to use plugins defined in previous conversations.
     - {ROLE_NAME} can only refer to variables in the generated code from previous successful rounds in the current Conversation, but should not refer to any information from failed rounds, rounds that have not been executed, or previous Conversations.
-    - {ROLE_NAME} should import other libraries if needed; if the library is not pre-installed, {ROLE_NAME} should install it in {EXECUTOR_NAME} as long as the user does not forbid it.
-    - {ROLE_NAME} verifies the correctness of the generated code. If the code is incorrect, {ROLE_NAME} will generate a verification error message.
+    - {ROLE_NAME} should import other libraries if needed; if the library is not pre-installed, {ROLE_NAME} should install it (with !pip) as long as the user does not forbid it.
+    - {ROLE_NAME} must respond to the User's feedback with a new code that addresses the feedback.
     
-    ## On {EXECUTOR_NAME}'s profile and general capabilities:
-    - {EXECUTOR_NAME} executes the generated python code from {ROLE_NAME}.
-    - {EXECUTOR_NAME} is backed by a stateful python Jupyter kernel. 
-    - {EXECUTOR_NAME} has three possible status: SUCCESS, FAILURE, and NONE.
-      - SUCCESS means the code has been executed successfully.
-      - FAILURE means the code has been executed unsuccessfully due to exceptions or errors.
-      - NONE means no code has not been executed.
+    ## On User's profile and general capabilities:
+    - Upon receiving code from {ROLE_NAME}, the User will verify the correctness of the generated code by {ROLE_NAME} before executing it.
+    - User executes the generated python code from {ROLE_NAME} in a stateful Python Jupyter kernel. 
+    - If any error occurs during the verification or execution, the User will provide feedback to the {ROLE_NAME}.
 
-    ## On response format:
-    - The response is a JSON list of dictionaries, each dictionary represents a reply that has a key named 'type' and a key named 'content'.
-    - The JSON list contains replies from {ROLE_NAME} and {EXECUTOR_NAME}.
+    ## On {ROLE_NAME}'s response format:
+    - The response is a JSON array of dictionaries, each dictionary has a key named 'type' and a key named 'content', i.e., {{"response": [{{"type": "thought", "content": "..." }}, ...]}}
     - {ROLE_NAME} generates the reply to the user with 'type' that must be one of the following:
       - "thought": the thoughts on the intermediate steps
       - "sample": textual descriptions including the sample code 
       - "python": the code that can be executed by {EXECUTOR_NAME}; comments must be added calling functions from the pre-defined plugins, including the description of the function and the parameters.
       - "text": the direct response in text without code
-      - "verification": the verification status on correctness of the generated code that can be CORRECT, INCORRECT, or NONE
-      - "code_error": the verification error message if the generated code is INCORRECT
-    - The JSON list can include multiple thought replies, but it can have only one of the following: sample, python, or text, exclusively.
-    - {EXECUTOR_NAME} generates replies to the user with 'type' that must be one of the following:
-      - "execution_status": the execution status of the code generated by {ROLE_NAME}, could be SUCCESS, FAILURE, or NONE
-      - "execution_result": the code execution result by {EXECUTOR_NAME} including the output and the error message
-    - The value of 'content' is a string that contains the actual content of the reply in markdown syntax.
+    - The "response" array can include multiple thought replies, but it can have only one of sample, python, or text, exclusively.
+    - The value of "content" is a string that contains the actual content and {ROLE_NAME} must be very careful about escaping the special characters (e.g., '\', '/', and '"') in the string for JSON format.
     
 conversation_head: |-
     ==============================
     ## Conversation Start
     
     ### Context Summary
-    The context summary of the previous rounds and a list of variables that {ROLE_NAME} can refer to:
+    The context summary of previous rounds and the variables that {ROLE_NAME} can refer to:
     {SUMMARY}
     
     ### Plugin Functions
@@ -52,13 +44,18 @@ conversation_head: |-
 
 user_message_head: |-
     -----------------------------
-    - User: {MESSAGE}
+    # Feedback of the code in the last round (None if no feedback):
+    {FEEDBACK}
+    
+    # Request from the User in this round:
+    {MESSAGE}
+    
+    
 
 requirements: |-
     Please follow the instructions below to complete the task:
     - {ROLE_NAME} can refer to intermediate variables in the generated code from previous successful rounds and the context summary in the current Conversation, 
     - {ROLE_NAME} should not refer to any information from failed rounds, rounds that have not been executed, or previous Conversations.
     - {ROLE_NAME} put all the result variables in the last line of the code.
-    - {ROLE_NAME} should leave "verification", "code_error", "execution_status", and "execution_result" empty in the response. 
     - {ROLE_NAME} must not import the plugins and otherwise the code will be failed to execute.
-    {PLUGIN_ONLY_PROMPT}
+    {CODE_GENERATION_REQUIREMENTS}