Code capability enhancement & Bot crash fix #272

Ninot1Quyi · 2024-11-02T13:34:01Z

Last Modified Time: November 10, 2024, 5:53 PM

Latest changes are as follows:

Improvement Effects
- Model: GPT-4o
- Initial Command: !goal("Your goal is: use only "!newAction" instructions and rely only on code execution to obtain a diamond pickaxe. You must complete this task step by step and by yourself. And can't use another "!command". You should promptly check to see what you have.")
- Effect: After testing, under the condition of relying solely on generated code, the bot can run stably for at least 30 minutes without crashing (I manually ended the process at 30 minutes), during which it executed over 130 validated code snippets.
- Remaining Issues:
  1. If illegal commands are executed, such as "attacking a non-existent entity," the server may kick the bot out.
  2. A very small number of tasks may lead to no execution result being obtained, causing code crashes. It is suspected that there may be unconsidered exceptional situations when receiving task results.
- WARNING: If you use the command above or set a goal that requires a long time to work, please pay attention to the execution status and token consumption, as the LLM may continuously generate code in certain situations. For example, when "an iron pickaxe is available and diamonds need to be mined," it might stand still using its code abilities to search for nearby diamond locations. Since diamonds are rare, it may fail to find them continuously, repeatedly improving the code and getting stuck, leading to substantial token consumption.Please test with caution, it cost me $60 to test with gpt-4o for 60min. But gpt-4o-mini is much cheaper and can be used to test this command
Added Features:
2.1 During code generation, the top select_num relevant skillsDocs related to !newAction("task") will be selected and sent to the LLM in the prompt to help it focus better on the task. Currently, select_num is set to 5.
2.2 Before running the code, use ESLint to perform syntax and exception checks on the generated code to detect issues in advance, check for undefined functions, and add exceptions to messages.
2.3 During code execution, detailed error information will be included in messages.
Added Files:
3.1 file path: ./bots/codeCheckTemplate.js
A template used for performing checks before code execution. ESLint cannot be used for detection in the sandbox.

3.2 file path: ./eslint.config.js
Manages the ESLint rules for code syntax and exception detection.
Modified Code Content:

4.1 package.json

- Added: ESLint dependency.

4.2 settings.js

- Set: code_timeout_mins=3, ensuring timely code execution updates and preventing long blocks.

4.3 coder.js

- Added: checkCode function to pre-check for syntax and exceptions. First, it checks whether the functions used in the code exist. If they don't, it writes the illegal functions to the message, then proceeds with syntax and exception checks.

- Modified: Modified the return value of stageCode function from return { main: mainFn }; to return { func: { main: mainFn }, src_check_copy: src_check_copy }; to ensure pre-execution exception detection.

4.4 action_manager.js

- Enhanced: catch (err) error detection to include detailed exception content and related code Docs in messages, improving the LLM's ability to fix code.

4.5 index.js

- Modified: docHelper and getSkillDocs return values to return the docArray of functions from the skill library for subsequent word embedding vector calculations.

4.6 prompter.js

- Added: this.skill_docs_embeddings = {}; to store the docArray word embedding vectors.

- Added: Parallel initialization of this.skill_docs_embeddings in initExamples.

- Added: getRelevantSkillDocs function to obtain select_num relevant doc texts based on input messages and select_num. If select_num >= 0, it is meaningful; otherwise, return all content sorted by relevance.

Note: This modification ensures code quality by making minimal changes only where necessary, while also clearing test outputs and comments. If further modifications are needed, please feel free to let me know.

…e-exception-fixes' into Tasks-more-relevant-docs-and-code-exception-fixes # Conflicts: # src/agent/coder.js # src/agent/prompter.js

Ninot1Quyi · 2024-11-02T20:04:49Z

Resolve merge conflicts with the latest code

New additions

Added the codeChackTemplate.js file under the bots directory for static syntax and exception detection.
Modified the return value of stageCode and the part of generateCodeLoop that runs the code to resolve merge conflict issues.
Added the ESLint configuration file eslint.config.js in the project root directory to manage code syntax and exception detection rules.

JurassikLizard · 2024-11-02T22:13:01Z

Can you try re-running this with a stupider model (not state-of-the-art lol). I'm curious to see if they benefit too, or just advanced ones.

Ninot1Quyi · 2024-11-03T13:41:19Z

Comparison Experiment on Low-Performance Models

1. Objective

The objective is set using the following command:
!goal("Your goal is: use only "!newAction" instructions and rely only on code execution to obtain a diamondpickaxe. You must complete this task step by step and by yourself. And can't use another "!command". You should promptly check to see what you have")

2. Model Selection

First, I tested the lowest-performance model, gpt-3.5-turbo, but it could not limit itself to using only !newAction and was unable to complete the task. Subsequently, I tested gpt-4o-mini.
All subsequent tests were conducted using gpt-4o-mini.

3. Experimental Process

Created a world and made two copies to ensure the environment was the same.
Used both modified and unmodified code to enter the same position, input the goal command, and let the bot execute the task.

4. Experimental Results

4.1 Original

Total run time: 16 minutes 41 seconds.

0 min: Start
3 min: First crash.
4 min: Second crash.
13 min: Acquired wooden pickaxe. [The bot was continually collecting wood and only completed the wooden pickaxe after multiple reminders to check existing items.]
16 min 24 s: Acquired a stone pickaxe.

4.2 Modified

I didn’t give any reminders to the bot while it was running.
Total run time: 16 minutes 22 seconds.

0 min: Start
4 min: Acquired a wooden pickaxe.
5 min 12 s: First crash.
8 min: Acquired a stone pickaxe.
11 min: Acquired iron ingots.
15 min: The content was obtained.
16 min 22s: Second crash.

4.3 Complete Comparison Video

Total duration: 16 minutes 41 seconds.
Watch the full comparison video here

…e-exception-fixes' into Tasks-more-relevant-docs-and-code-exception-fixes # Conflicts: # src/agent/coder.js

Ninot1Quyi · 2024-11-04T17:11:09Z

Resolved merge conflict with Action Manager

…e-exception-fixes' into Tasks-more-relevant-docs-and-code-exception-fixes # Conflicts: # src/agent/prompter.js

Ninot1Quyi · 2024-11-08T16:11:59Z

There is a part that needs improvement

Ninot1Quyi · 2024-11-08T17:33:21Z

Improve the relevance of docs to !newAction("task")Fix Qwen api concurrency limit issue

…more-relevant-docs-and-code-exception-fixes # Conflicts: # src/agent/action_manager.js # src/agent/prompter.js

Ninot1Quyi · 2024-12-13T16:11:52Z

I'm back! Just resolving merge conflicts for now. Code migration and improvements are in progress.

Ninot1Quyi · 2024-12-16T10:59:11Z

The embedded concurrency limitation issue has been resolved in the latest qwen.js of the current PR, so qwen's own embedded model can be used in qwen.json instead of using openai's embedded model.
I'm currently working on an issue with codeCheckTemplate.js that code evaluation depends on. I'm looking for a way to remove the codeCheckTemplate.js file

…-docs-and-code-exception-fixes # Conflicts: # profiles/qwen.json

…-intl'

Ninot1Quyi · 2025-01-03T10:15:46Z

@MaxRobinsonTheGreat
Hi Max, I’ve made the following updates as per your suggestions:

I separated the skill selection code into 'src/agent/library/skill_library.js'.
I added the 'relevant_docs_count' configuration interface in settings.js.
The template files are now implemented in a different way. I merged them and placed them in 'bots/codeTemplate.json' for the program to read directly. I searched around but couldn't find a way to implement static syntax checking within the SES sandbox because SES doesn’t allow code with static and dynamic import statements to run inside the sandbox.

Take a look and let me know if any further improvements are needed. Feel free to reach out if you need anything!

…more-relevant-docs-and-code-exception-fixes # Conflicts: # src/agent/prompter.js

MaxRobinsonTheGreat

This looks much better and is very close to being done. Thanks for your work. A few small requests:

separate execTemplate and lintTemplate again
don't add skill docs in messages, they should always be in context and it will quickly fill up message history
other little changes

bots/codeTemplate.json

src/agent/action_manager.js

src/agent/coder.js

src/models/qwen.js

…more-relevant-docs-and-code-exception-fixes # Conflicts: # src/agent/prompter.js

…ate limiting

…r.js

Ninot1Quyi · 2025-01-19T14:14:00Z

1. Results:

I have made all the code modifications according to your suggestions. All the modified parts have been completed as required. However, I would like to discuss some details with you regarding whether we need to add appropriate error explanations when errors occur during code inspection and execution.

2. Discussion on "Should we add proper error explanation prompts when errors occur during code inspection and execution":

Here are some ideas I've come up with! If you think it's necessary, I can make the changes in the current PR. Alternatively, I can make these improvements to the coder in a new PR after merging. Feel free to share your thoughts by replying to this PR or shoot me an email at [[email protected]]! 😊

2.1 Option 1:

Simply changing the number of "error-related skill_doc" prompts in src/agent/coder.js and src/agent/action_manager.js to 1 will not effectively alleviate the message explosion.

2.2 Option 2:

Add another handling module to replace Assistant's erroneous answers in the coder message log with error-free code from subsequent Assistant responses, and erase the intermediate correction process. This should be a more reasonable approach.

2.3 Option 3:

Modify the history update process in coder.js, separating the debugging conversations between "starting to generate code" and "generated code running correctly," and only add the generated code to the history when there are no errors.

2.4 Option 4:

Extract the code generation process from the complete conversation history and treat it as a separate "small brain" that is solely responsible for code generation, debugging, and execution, much like the cerebellum in the human brain.

3 Summary:

The reason I suggest this is because I want to provide the LLM with correct usage instructions when errors occur, so that it can better correct mistakes, rather than getting stuck in its own delusions and unable to fix them.

Perhaps there is a better way to solve this issue, one that provides the LLM with proper help without causing a message explosion.

…more-relevant-docs-and-code-exception-fixes # Conflicts: # settings.js

…more-relevant-docs-and-code-exception-fixes # Conflicts: # src/models/prompter.js # src/models/qwen.js

MaxRobinsonTheGreat · 2025-02-05T21:39:58Z

Merged!! 🎉
Thank you ninot, this is really great. I've tested across many models and all seem to perform well. It also cuts down on the system prompt from ~20k characters to ~5k, so a significant efficiency upgrade. Thank you for your work and patience.

Ninot1Quyi · 2025-02-06T05:48:57Z

Merged!! 🎉 Thank you ninot, this is really great. I've tested across many models and all seem to perform well. It also cuts down on the system prompt from ~20k characters to ~5k, so a significant efficiency upgrade. Thank you for your work and patience.

Hey Max! I'm so glad to have your approval! Can't wait to see more of the awesome things you create. I'll keep doing my part to make Mindcraft even more fun. Also, I'm always happy to help out. Wishing you lots of happiness! :)

Ninot1Quyi added 4 commits November 1, 2024 01:08

Sort docs by relevance to !newAction("task")

90df61d

Add select_num exception range judgment

f264b23

Add select_num exception range judgment

17fa2b6

Code capability enhancement & bot crash fix

ecaf5e8

Ninot1Quyi closed this Nov 2, 2024

Ninot1Quyi force-pushed the Tasks-more-relevant-docs-and-code-exception-fixes branch from ecaf5e8 to 02232e2 Compare November 2, 2024 18:03

Ninot1Quyi added 2 commits November 3, 2024 02:24

Merge remote-tracking branch 'origin/Tasks-more-relevant-docs-and-cod…

80d0c25

…e-exception-fixes' into Tasks-more-relevant-docs-and-code-exception-fixes # Conflicts: # src/agent/coder.js # src/agent/prompter.js

Merger conflict resolution

5e84d69

Ninot1Quyi reopened this Nov 2, 2024

Change note to English

e1dfad9

Ninot1Quyi closed this Nov 4, 2024

Ninot1Quyi force-pushed the Tasks-more-relevant-docs-and-code-exception-fixes branch from e1dfad9 to 0a21561 Compare November 4, 2024 15:05

Ninot1Quyi added 2 commits November 4, 2024 23:20

Merge remote-tracking branch 'origin/Tasks-more-relevant-docs-and-cod…

615af11

…e-exception-fixes' into Tasks-more-relevant-docs-and-code-exception-fixes # Conflicts: # src/agent/coder.js

Resolving merge conflicts with Task Manager

82b37e0

Ninot1Quyi reopened this Nov 4, 2024

Fix spelling mistakes

f6e309a

Ninot1Quyi closed this Nov 8, 2024

Ninot1Quyi force-pushed the Tasks-more-relevant-docs-and-code-exception-fixes branch from f6e309a to a6edd8f Compare November 8, 2024 10:20

Ninot1Quyi added 2 commits November 8, 2024 18:33

Merge remote-tracking branch 'origin/Tasks-more-relevant-docs-and-cod…

043fc78

…e-exception-fixes' into Tasks-more-relevant-docs-and-code-exception-fixes # Conflicts: # src/agent/prompter.js

Resolving conflicts created by adding new annotations

e15c516

Ninot1Quyi reopened this Nov 8, 2024

Ninot1Quyi added 2 commits November 9, 2024 01:29

Improve the relevance of docs to !newAction("task")

c8302c2

Fix Qwen api concurrency limit issue

a368451

code_timeout_mins is set to 3

2322f78

Ninot1Quyi added 2 commits December 13, 2024 23:27

Merge remote-tracking branch 'refs/remotes/upstream/main' into Tasks-…

11c63cb

…more-relevant-docs-and-code-exception-fixes # Conflicts: # src/agent/action_manager.js # src/agent/prompter.js

Resolve merge conflicts with latest code

1835d5e

Preliminary code separation

c5b6cd5

Ninot1Quyi added 5 commits December 16, 2024 19:11

Merge remote-tracking branch 'upstream/main' into Tasks-more-relevant…

37aecb0

…-docs-and-code-exception-fixes # Conflicts: # profiles/qwen.json

Modify the url of qwen.json to default to the international version '…

4c8c61b

…-intl'

Code Separation: Related Skill Selection

b1dad6b

Add setting for number of "relevant_docs_count"

72397c4

Merge code templates into codeTemplate.json

a7000ea

Ninot1Quyi added 2 commits January 4, 2025 12:30

Merge remote-tracking branch 'refs/remotes/upstream/main' into Tasks-…

2127e5b

…more-relevant-docs-and-code-exception-fixes # Conflicts: # src/agent/prompter.js

Resolve merge conflicts in deepseek

a458a66

MaxRobinsonTheGreat requested changes Jan 6, 2025

View reviewed changes

Ninot1Quyi added 8 commits January 12, 2025 23:52

Rollback two code template.js

485d4a6

Rollback two code template.js

8590366

Rename two code template

9be83fe

Rename func 'check' to 'lint'

4782da1

Merge remote-tracking branch 'refs/remotes/upstream/main' into Tasks-…

5dd57dd

…more-relevant-docs-and-code-exception-fixes # Conflicts: # src/agent/prompter.js

Fix Qwen.js to be compatible with OpenAI and add random backoff for r…

1a86c3a

…ate limiting

add the lost || new_resume

1d54af2

Remove the relevant_skill_doc prompts from coder.js and action_manage…

f0396df

…r.js

Ninot1Quyi requested a review from MaxRobinsonTheGreat January 19, 2025 14:16

Ninot1Quyi added 3 commits January 22, 2025 13:53

Merge remote-tracking branch 'refs/remotes/upstream/main' into Tasks-…

2019dff

…more-relevant-docs-and-code-exception-fixes # Conflicts: # settings.js

Merge remote-tracking branch 'refs/remotes/upstream/main' into Tasks-…

c62ee6e

…more-relevant-docs-and-code-exception-fixes # Conflicts: # src/models/prompter.js # src/models/qwen.js

Refactor Qwen.js to OpenAI API style

8277c23

MaxRobinsonTheGreat merged commit 8277c23 into kolbytn:main Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code capability enhancement & Bot crash fix #272

Code capability enhancement & Bot crash fix #272

Ninot1Quyi commented Nov 2, 2024 •

edited

Loading

Ninot1Quyi commented Nov 2, 2024

JurassikLizard commented Nov 2, 2024

Ninot1Quyi commented Nov 3, 2024 •

edited

Loading

Ninot1Quyi commented Nov 4, 2024 •

edited

Loading

Ninot1Quyi commented Nov 8, 2024

Ninot1Quyi commented Nov 8, 2024

Ninot1Quyi commented Dec 13, 2024

Ninot1Quyi commented Dec 16, 2024

Ninot1Quyi commented Jan 3, 2025

MaxRobinsonTheGreat left a comment

Ninot1Quyi commented Jan 19, 2025

MaxRobinsonTheGreat commented Feb 5, 2025

Ninot1Quyi commented Feb 6, 2025 •

edited

Loading

Code capability enhancement & Bot crash fix #272

Code capability enhancement & Bot crash fix #272

Conversation

Ninot1Quyi commented Nov 2, 2024 • edited Loading

Last Modified Time: November 10, 2024, 5:53 PM

Latest changes are as follows:

Ninot1Quyi commented Nov 2, 2024

Resolve merge conflicts with the latest code

New additions

JurassikLizard commented Nov 2, 2024

Ninot1Quyi commented Nov 3, 2024 • edited Loading

Comparison Experiment on Low-Performance Models

1. Objective

2. Model Selection

3. Experimental Process

4. Experimental Results

4.1 Original

4.2 Modified

4.3 Complete Comparison Video

Ninot1Quyi commented Nov 4, 2024 • edited Loading

Ninot1Quyi commented Nov 8, 2024

Ninot1Quyi commented Nov 8, 2024

Ninot1Quyi commented Dec 13, 2024

Ninot1Quyi commented Dec 16, 2024

Ninot1Quyi commented Jan 3, 2025

MaxRobinsonTheGreat left a comment

Choose a reason for hiding this comment

Ninot1Quyi commented Jan 19, 2025

1. Results:

2. Discussion on "Should we add proper error explanation prompts when errors occur during code inspection and execution":

2.1 Option 1:

2.2 Option 2:

2.3 Option 3:

2.4 Option 4:

3 Summary:

MaxRobinsonTheGreat commented Feb 5, 2025

Ninot1Quyi commented Feb 6, 2025 • edited Loading

Ninot1Quyi commented Nov 2, 2024 •

edited

Loading

Ninot1Quyi commented Nov 3, 2024 •

edited

Loading

Ninot1Quyi commented Nov 4, 2024 •

edited

Loading

Ninot1Quyi commented Feb 6, 2025 •

edited

Loading