Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sysbox support to remote runtime for eval; Add memory monitor, stress tests to help debug memory issue #6684

Merged
merged 118 commits into from
Feb 18, 2025
Merged
Show file tree
Hide file tree
Changes from 111 commits
Commits
Show all changes
118 commits
Select commit Hold shift + click to select a range
7421aa1
log more mem info
xingyaoww Jan 18, 2025
501824a
simplify remote stress test a little bit
xingyaoww Jan 18, 2025
867f672
reliable way to reproduce error
xingyaoww Jan 18, 2025
c6902da
use a more reasonable tests
xingyaoww Jan 18, 2025
5c44726
Merge branch 'main' into xw/bash-perf
xingyaoww Feb 3, 2025
61b87ce
feat(runtime): add memory monitoring to prevent k8s OOM kills
openhands-agent Feb 3, 2025
54ac167
update lock
xingyaoww Feb 3, 2025
fc18e5c
update memory monitor for action execution server
xingyaoww Feb 3, 2025
feda348
monitor the entire pg
xingyaoww Feb 3, 2025
ed35d53
fix recursive call
xingyaoww Feb 3, 2025
6d6adba
Merge commit 'f24fbec165de33749500dc06c9b6e753b588dbf9' into xw/bash-…
xingyaoww Feb 3, 2025
5f33ae1
update log
xingyaoww Feb 3, 2025
4699e91
use prlimit to restrict memory usage
xingyaoww Feb 3, 2025
7fda066
fix prlimit
xingyaoww Feb 3, 2025
33737cd
also support running stress test locally
xingyaoww Feb 3, 2025
9da4550
log memory stuff in case of high system pressure
xingyaoww Feb 3, 2025
57afe20
tweak tests
xingyaoww Feb 3, 2025
6bc5ca3
combine docker stress test with remote runtime
xingyaoww Feb 3, 2025
d2d57fe
remove save perf debug
xingyaoww Feb 4, 2025
e09ac90
makes it work for both remote and docke rtests
xingyaoww Feb 4, 2025
a1d200c
allow override max memory gb in action execution server; try to get…
xingyaoww Feb 4, 2025
19f025b
ok got this working with docker
xingyaoww Feb 4, 2025
6e678fd
use pss instead of rss for process mem
xingyaoww Feb 4, 2025
74d048b
Merge branch 'main' into xw/bash-perf
enyst Feb 4, 2025
507c0a9
update runtime startup command for remote runtime too
xingyaoww Feb 5, 2025
a6bbbfe
Merge commit '74d048b62341b33e32961b861e6312ed70086ac6' into xw/bash-…
xingyaoww Feb 5, 2025
9b92118
update lock
xingyaoww Feb 5, 2025
f39181c
update stresstest script
xingyaoww Feb 5, 2025
81634ea
Merge commit '5fa2634d6070b84e912bb85017cf686cd7abecdf' into xw/bash-…
xingyaoww Feb 7, 2025
34e36d4
add stress test for file editing
xingyaoww Feb 10, 2025
22b8a21
Implement str_replace_editor on server-side
openhands-agent Feb 10, 2025
cc37799
Refactor str_replace_editor to edit method and update imports
openhands-agent Feb 10, 2025
93e9d96
Remove duplicate write function and incorporate it into edit function
openhands-agent Feb 10, 2025
c902fb2
Refactor edit function to use file_editor from openhands_aci.editor
openhands-agent Feb 10, 2025
dca6e1d
Simplify function_calling.py and update LLMBasedFileEditTool
openhands-agent Feb 10, 2025
13e2282
Update FileEditAction and add translation function for str_replace_ed…
openhands-agent Feb 10, 2025
c042173
Update FileEditAction and translate_str_replace_editor_to_edit_action…
openhands-agent Feb 10, 2025
0de0d66
Improve FileEditAction docstring with detailed mode and attribute exp…
openhands-agent Feb 10, 2025
b41c63e
Remove translated_ipython_code attribute from FileEditAction
openhands-agent Feb 10, 2025
d43ef36
Remove FileEditAction from openhands/events/action/action.py
openhands-agent Feb 10, 2025
313317c
refactor ACI as first-class action
xingyaoww Feb 10, 2025
037ce71
remove translate to jupyter code
xingyaoww Feb 10, 2025
c6e979b
improve repr of file edit action
xingyaoww Feb 10, 2025
a75ab88
simplify
xingyaoww Feb 10, 2025
6835d07
code dedup with file_editor util function
xingyaoww Feb 10, 2025
e8e3437
manually set command
xingyaoww Feb 10, 2025
f4bf811
make error message more informative
xingyaoww Feb 10, 2025
7853b28
better handle permission error
xingyaoww Feb 10, 2025
4f4e5e6
include edit as another fn for runtime
xingyaoww Feb 10, 2025
ebcfd80
rename test
xingyaoww Feb 10, 2025
2919c00
add initial tests
xingyaoww Feb 10, 2025
1b8f59a
Add new tests for create, str_replace, insert, and undo_edit operatio…
openhands-agent Feb 10, 2025
5d19ead
Add comprehensive test suite for file editing operations
openhands-agent Feb 10, 2025
3c079c5
remove extra
xingyaoww Feb 10, 2025
cbd56f1
add tons of tests
xingyaoww Feb 10, 2025
b92f6ba
fix duplicate args
xingyaoww Feb 10, 2025
c9d3da4
Merge branch 'main' into implement-str-replace-editor
xingyaoww Feb 10, 2025
4e44ba7
bump openhands-aci to 0.2.1
xingyaoww Feb 10, 2025
b1cb892
Merge commit 'c9d3da4529040b9bd87ef0084f509f49fcc3706f' into implemen…
xingyaoww Feb 10, 2025
38797b9
update lock
xingyaoww Feb 10, 2025
a9dc6d4
add a memory test that can run in CI
xingyaoww Feb 10, 2025
4277725
Merge commit '38797b9fa1ee49a19934bd4372a818a9a14085c0' into xw/mem-leak
xingyaoww Feb 10, 2025
df01825
tweak monitor
xingyaoww Feb 10, 2025
01685b5
only enable prlimit when max_memory_mb is not None
xingyaoww Feb 11, 2025
3facf9e
update serialization
xingyaoww Feb 11, 2025
3785bc7
add updated event to fe
xingyaoww Feb 11, 2025
7af01d2
remove unnecessary test
xingyaoww Feb 11, 2025
6b031d0
try fix a couple of tests
xingyaoww Feb 11, 2025
c5cab72
fix legacy serialization
xingyaoww Feb 11, 2025
592d099
update doc
xingyaoww Feb 11, 2025
44c3874
handle serialization deprecation for observation
xingyaoww Feb 11, 2025
b7fc4b5
rename fn
xingyaoww Feb 11, 2025
6ae976b
add a bunch of tests for observation serialization
xingyaoww Feb 11, 2025
6277795
make sure command by default is not None
xingyaoww Feb 11, 2025
5d55afa
handle legacy editing action serialization by using regex to parse args
xingyaoww Feb 11, 2025
6b69d60
handle another case for action serialization
xingyaoww Feb 11, 2025
3b1927e
fix test
xingyaoww Feb 11, 2025
22f46fc
Merge branch 'main' into implement-str-replace-editor
neubig Feb 11, 2025
07a61f7
Update openhands/events/action/files.py
enyst Feb 11, 2025
d3a638d
Update openhands/events/action/files.py
enyst Feb 11, 2025
187a266
add filepath to error observation
xingyaoww Feb 11, 2025
41a6ae2
Merge commit '6a6dc93e0379bfdf96096fce06b21a8e51871aec' into xw/mem-leak
xingyaoww Feb 11, 2025
7d56e1d
remove view_range from FileEditAction
xingyaoww Feb 11, 2025
fc56181
Merge commit '7d56e1dc66734d490a30c6e9db120095754232cd' into xw/mem-leak
xingyaoww Feb 11, 2025
f1985bb
only pop if not exists
xingyaoww Feb 11, 2025
a9f2c59
Merge commit 'f1985bb23d7be90e9d8592492a8088e1b3348f4e' into xw/mem-leak
xingyaoww Feb 11, 2025
7b11771
add memory profiler
xingyaoww Feb 11, 2025
c505c58
add memory usage monitor
xingyaoww Feb 11, 2025
d0979e1
set profiler log with info
xingyaoww Feb 11, 2025
7ae2216
allow disable RUNTIME_MEMORY_MONITOR
xingyaoww Feb 11, 2025
70b89b0
Merge branch 'main' into xw/mem-leak
xingyaoww Feb 11, 2025
cc5041a
enable runtime memory monitor for tests
xingyaoww Feb 11, 2025
3de6a2f
use pss as backend
xingyaoww Feb 11, 2025
af5a771
try sysbox
xingyaoww Feb 11, 2025
0dde700
Merge commit '70b89b0d8b1e3ab9120e5ed5f11f96ad355ff19a' into xw/mem-leak
xingyaoww Feb 11, 2025
b6fc701
Revert "try sysbox"
xingyaoww Feb 11, 2025
56c24f8
do not use jupyter execution for editing directly
xingyaoww Feb 11, 2025
d6514a1
fix test
xingyaoww Feb 11, 2025
518e89f
fix test
xingyaoww Feb 11, 2025
a65bca0
add debugging logs
xingyaoww Feb 11, 2025
be4f801
Revert "Revert "try sysbox""
xingyaoww Feb 11, 2025
ab40912
stop using nice since it is not compatible with sysbox?
xingyaoww Feb 12, 2025
fc60f6b
revert test bash
xingyaoww Feb 12, 2025
737f04f
Merge commit '7e359eda4af6c66d3b08efd1db0d74ff6bc7ccd4' into xw/mem-leak
xingyaoww Feb 12, 2025
698c01f
handle cases when we terminate the agent, the cd failed due to prev c…
xingyaoww Feb 13, 2025
0540efc
make runtime class a config and set it to sysbox for swebench eval
xingyaoww Feb 13, 2025
8242721
refactor config logic into get_default_sandbox_config
xingyaoww Feb 13, 2025
fcba1af
Refactor SandboxConfig to use get_default_sandbox_config_for_eval
openhands-agent Feb 13, 2025
3140e98
refactored couple evals
xingyaoww Feb 13, 2025
5524939
refactor remaining tests
xingyaoww Feb 13, 2025
37158d5
remove resource mapper file since it is not needed
xingyaoww Feb 13, 2025
399907e
Update openhands/runtime/utils/memory_monitor.py
enyst Feb 15, 2025
4d467ed
Update openhands/runtime/utils/memory_monitor.py
enyst Feb 15, 2025
b09fed3
Merge branch 'main' into xw/mem-leak
enyst Feb 15, 2025
02ea27a
poetry lock
enyst Feb 15, 2025
4345611
Merge commit '6c4801360117559bf8b23f061af433123cff8de9' into xw/mem-leak
xingyaoww Feb 17, 2025
bc94fdd
Merge commit '8d097efb4fdabeb9abd8be388315d49da5478b81' into xw/mem-leak
xingyaoww Feb 18, 2025
a2971fd
set default runtime class to sysbox
xingyaoww Feb 18, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 4 additions & 7 deletions evaluation/benchmarks/EDA/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
EvalMetadata,
EvalOutput,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand All @@ -17,7 +18,6 @@
from openhands.controller.state.state import State
from openhands.core.config import (
AppConfig,
SandboxConfig,
get_llm_config_arg,
get_parser,
)
Expand Down Expand Up @@ -60,17 +60,14 @@ def codeact_user_response_eda(state: State) -> str:
def get_config(
metadata: EvalMetadata,
) -> AppConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'python:3.12-bookworm'
config = AppConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=SandboxConfig(
base_container_image='python:3.12-bookworm',
enable_auto_lint=False,
use_host_network=False,
remote_runtime_enable_retries=True,
),
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
Expand Down
16 changes: 5 additions & 11 deletions evaluation/benchmarks/agent_bench/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
EvalMetadata,
EvalOutput,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand All @@ -25,7 +26,6 @@
from openhands.controller.state.state import State
from openhands.core.config import (
AppConfig,
SandboxConfig,
get_llm_config_arg,
parse_arguments,
)
Expand All @@ -40,21 +40,15 @@
def get_config(
metadata: EvalMetadata,
) -> AppConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'python:3.12-slim'

config = AppConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
runtime=os.environ.get('RUNTIME', 'docker'),
max_iterations=metadata.max_iterations,
sandbox=SandboxConfig(
base_container_image='python:3.12-slim',
enable_auto_lint=True,
use_host_network=False,
api_key=os.environ.get('ALLHANDS_API_KEY', None),
remote_runtime_api_url=os.environ.get('SANDBOX_REMOTE_RUNTIME_API_URL'),
keep_runtime_alive=False,
remote_runtime_init_timeout=3600,
remote_runtime_enable_retries=True,
),
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
Expand Down
16 changes: 4 additions & 12 deletions evaluation/benchmarks/aider_bench/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
EvalMetadata,
EvalOutput,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand All @@ -24,7 +25,6 @@
from openhands.controller.state.state import State
from openhands.core.config import (
AppConfig,
SandboxConfig,
get_llm_config_arg,
load_from_toml,
parse_arguments,
Expand All @@ -47,22 +47,14 @@
def get_config(
metadata: EvalMetadata,
) -> AppConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'python:3.11-bookworm'
config = AppConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
runtime=os.environ.get('RUNTIME', 'docker'),
max_iterations=metadata.max_iterations,
sandbox=SandboxConfig(
base_container_image='python:3.11-bookworm',
enable_auto_lint=True,
use_host_network=False,
timeout=100,
api_key=os.environ.get('ALLHANDS_API_KEY', None),
remote_runtime_api_url=os.environ.get('SANDBOX_REMOTE_RUNTIME_API_URL'),
keep_runtime_alive=False,
remote_runtime_init_timeout=1800,
remote_runtime_enable_retries=True,
),
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
Expand Down
11 changes: 4 additions & 7 deletions evaluation/benchmarks/biocoder/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
EvalOutput,
codeact_user_response,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand All @@ -22,7 +23,6 @@
from openhands.controller.state.state import State
from openhands.core.config import (
AppConfig,
SandboxConfig,
get_llm_config_arg,
parse_arguments,
)
Expand Down Expand Up @@ -57,18 +57,15 @@ def get_config(
metadata: EvalMetadata,
) -> AppConfig:
BIOCODER_BENCH_CONTAINER_IMAGE = 'public.ecr.aws/i5g0m1f6/eval_biocoder:v1.0'
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = BIOCODER_BENCH_CONTAINER_IMAGE

config = AppConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=SandboxConfig(
base_container_image=BIOCODER_BENCH_CONTAINER_IMAGE,
enable_auto_lint=True,
use_host_network=False,
remote_runtime_enable_retries=True,
),
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
Expand Down
12 changes: 5 additions & 7 deletions evaluation/benchmarks/bird/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
EvalMetadata,
EvalOutput,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand All @@ -25,7 +26,6 @@
from openhands.controller.state.state import State
from openhands.core.config import (
AppConfig,
SandboxConfig,
get_llm_config_arg,
parse_arguments,
)
Expand Down Expand Up @@ -71,17 +71,15 @@ def codeact_user_response(state: State) -> str:
def get_config(
metadata: EvalMetadata,
) -> AppConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'python:3.12-bookworm'

config = AppConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=SandboxConfig(
base_container_image='python:3.12-bookworm',
enable_auto_lint=True,
use_host_network=False,
remote_runtime_enable_retries=True,
),
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
Expand Down
11 changes: 4 additions & 7 deletions evaluation/benchmarks/browsing_delegation/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
EvalMetadata,
EvalOutput,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand All @@ -18,7 +19,6 @@
from openhands.controller.state.state import State
from openhands.core.config import (
AppConfig,
SandboxConfig,
get_llm_config_arg,
parse_arguments,
)
Expand All @@ -36,17 +36,14 @@ def get_config(
assert (
metadata.max_iterations == 1
), 'max_iterations must be 1 for browsing delegation evaluation.'
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'python:3.12-bookworm'
config = AppConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=SandboxConfig(
base_container_image='python:3.12-bookworm',
enable_auto_lint=False,
use_host_network=False,
remote_runtime_enable_retries=True,
),
sandbox=sandbox_config,
workspace_base=None,
workspace_mount_path=None,
)
Expand Down
24 changes: 5 additions & 19 deletions evaluation/benchmarks/commit0_bench/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
EvalOutput,
assert_and_raise,
codeact_user_response,
get_default_sandbox_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand All @@ -25,7 +26,6 @@
from openhands.core.config import (
AgentConfig,
AppConfig,
SandboxConfig,
get_llm_config_arg,
get_parser,
)
Expand Down Expand Up @@ -105,38 +105,24 @@ def get_config(
instance: pd.Series,
metadata: EvalMetadata,
) -> AppConfig:
# COMMIT0_CONTAINER_IMAGE = 'wentingzhao/'
assert USE_INSTANCE_IMAGE
# We use a different instance image for the each instance of commit0 eval
repo_name = instance['repo'].split('/')[1]
base_container_image = get_instance_docker_image(repo_name)
logger.info(
f'Using instance container image: {base_container_image}. '
f'Please make sure this image exists. '
f'Submit an issue on https://github.com/All-Hands-AI/OpenHands if you run into any issues.'
)
# else:
# raise
# base_container_image = SWE_BENCH_CONTAINER_IMAGE
# logger.info(f'Using swe-bench container image: {base_container_image}')

sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = base_container_image

config = AppConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
max_iterations=metadata.max_iterations,
runtime=os.environ.get('RUNTIME', 'docker'),
sandbox=SandboxConfig(
base_container_image=base_container_image,
enable_auto_lint=True,
use_host_network=False,
# large enough timeout, since some testcases take very long to run
timeout=300,
api_key=os.environ.get('ALLHANDS_API_KEY', None),
remote_runtime_api_url=os.environ.get('SANDBOX_REMOTE_RUNTIME_API_URL'),
keep_runtime_alive=False,
remote_runtime_init_timeout=3600,
remote_runtime_enable_retries=True,
),
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
Expand Down
11 changes: 4 additions & 7 deletions evaluation/benchmarks/discoverybench/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
EvalOutput,
codeact_user_response,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand All @@ -25,7 +26,6 @@
from openhands.core.config import (
AgentConfig,
AppConfig,
SandboxConfig,
get_llm_config_arg,
parse_arguments,
)
Expand Down Expand Up @@ -62,17 +62,14 @@
def get_config(
metadata: EvalMetadata,
) -> AppConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'python:3.12-bookworm'
config = AppConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=SandboxConfig(
base_container_image='python:3.12-bookworm',
enable_auto_lint=True,
use_host_network=False,
remote_runtime_enable_retries=True,
),
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
Expand Down
11 changes: 4 additions & 7 deletions evaluation/benchmarks/gaia/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
EvalOutput,
codeact_user_response,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
Expand All @@ -21,7 +22,6 @@
from openhands.controller.state.state import State
from openhands.core.config import (
AppConfig,
SandboxConfig,
get_llm_config_arg,
get_parser,
)
Expand All @@ -47,17 +47,14 @@
def get_config(
metadata: EvalMetadata,
) -> AppConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'python:3.12-bookworm'
config = AppConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=SandboxConfig(
base_container_image='python:3.12-bookworm',
enable_auto_lint=True,
use_host_network=False,
remote_runtime_enable_retries=True,
),
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
Expand Down
Loading
Loading