Make unit tests use redis #6245

cognifloyd · 2024-09-17T21:04:54Z

The graceful_shutdown tests have been very flaky using the NoOpDriver. Switching from NoOpDriver to RedisDriver will hopefully make our CI more stable.

This was discussed in a TSC meeting and implemented by @FileMagic (#6223) and @guzzijones (#6236). I cherry-picked the redis-related parts of their excellent work. After stumbling through the cherry-picks (I missed a few things, and had to rebase/cherry-pick a few times), I refactored a few things:

GHA: use the redis container defined in services, dropping the tasks that managed it manually.
use ST2TESTS_REDIS_* as the format for new env vars instead of ST2_OVERRIDE_COORDINATOR_REDIS_*.
fix a regression I added in Clean up import side effects in tests #6241 that only became apparent once we stopped using NoOpDriver. It's surprising and scary how many of our tests rely on import-time side-effects.
refactored tests to ensure each test got clean config. Several tests were relying on the config changes previous tests made, making them somewhat brittle. This will hopefully make that better.

👏 Thank you @FileMagic and @guzzijones for getting this figured out! I have high hopes for more stable CI thanks to your excellent work! 🎉

This reverts commit 820001d.

All tests (unit, integration, pack) need redis now.

tests_config.parse_args is called when importing st2tests and again in setUpClass. if this is needed, try self.reset()

get_members should return a list or tuple, not a string. I noticed while debugging that get_members output ended up as ['m', 'e', 'm', 'b', 'e', 'r', '-', '1'] because it listified the string. So, use a tuple when creating the mocked NoOpAsyncResult so it is closer to the actual return values.

cognifloyd · 2024-09-19T03:36:16Z

st2actions/tests/unit/test_policies.py

+# This needs to run before creating FakeConcurrencyApplicator below.
+tests_config.parse_args()


These tests were inadvertently using the NoOpDriver because of the import-time creation of FakeConcurrencyApplicator in the @mock.patch.object decorators. parse_args() wasn't called until setUpClass which happens after import time.

I found this by putting some raise ValueError in codepaths that create the NoOpDriver to make sure nothing was using it.

cognifloyd · 2024-09-19T03:39:04Z

st2actions/tests/unit/test_worker.py

+        tests_config.reset()
+        tests_config.parse_args()


Since these tests need to control the config, changing values in each test, it makes more sense to setup the cfg.CONF per test instead of only during setUpClass. These two lines ensure that the config has been reset to a clean state before proceeding.

In a future refactor, this would be a good function to turn into a pytest fixture, after we've got pytest running everything.

cognifloyd · 2024-09-19T03:40:01Z

st2actions/tests/unit/test_workflow_engine.py

+        tests_config.reset()
+        tests_config.parse_args()


The same issues with tests_config apply to this file too. So, reset it per test, not just per class.

cognifloyd · 2024-09-19T03:43:05Z

st2actions/tests/unit/test_workflow_engine.py

-            coordination.ToozConnectionError("foobar"),
-            coordination.ToozConnectionError("foobar"),


Without this, the side effect happens too much, resulting in this traceback + test error:

====================================================================== 1) ERROR: test_process_error_handling (tests.unit.test_workflow_engine.WorkflowExecutionHandlerTest) ---------------------------------------------------------------------- Traceback (most recent call last): virtualenv/lib/python3.8/site-packages/mock/mock.py line 1452 in patched return func(*newargs, **newkeywargs) st2actions/tests/unit/test_workflow_engine.py line 227 in test_process_error_handling workflows.get_engine().process(t1_ac_ex_db) st2actions/st2actions/workflows/workflows.py line 103 in process self.fail_workflow_execution(message, e) st2actions/st2actions/workflows/workflows.py line 191 in fail_workflow_execution wf_svc.update_task_state(task_ex_id, ac_const.LIVEACTION_STATUS_FAILED) virtualenv/lib/python3.8/site-packages/retrying.py line 56 in wrapped_f return Retrying(*dargs, **dkw).call(f, *args, **kw) virtualenv/lib/python3.8/site-packages/retrying.py line 257 in call return attempt.get(self._wrap_exception) virtualenv/lib/python3.8/site-packages/retrying.py line 301 in get six.reraise(self.value[0], self.value[1], self.value[2]) virtualenv/lib/python3.8/site-packages/six.py line 719 in reraise raise value virtualenv/lib/python3.8/site-packages/retrying.py line 251 in call attempt = Attempt(fn(*args, **kwargs), attempt_number, False) virtualenv/lib/python3.8/site-packages/retrying.py line 56 in wrapped_f return Retrying(*dargs, **dkw).call(f, *args, **kw) virtualenv/lib/python3.8/site-packages/retrying.py line 266 in call raise attempt.get() virtualenv/lib/python3.8/site-packages/retrying.py line 301 in get six.reraise(self.value[0], self.value[1], self.value[2]) virtualenv/lib/python3.8/site-packages/six.py line 719 in reraise raise value virtualenv/lib/python3.8/site-packages/retrying.py line 251 in call attempt = Attempt(fn(*args, **kwargs), attempt_number, False) st2common/st2common/services/workflows.py line 1054 in update_task_state update_execution_records( st2common/st2common/services/workflows.py line 1475 in update_execution_records ex_svc.update_execution(wf_lv_ac_db, publish=pub_ac_ex, set_result_size=True) st2common/st2common/services/executions.py line 199 in update_execution with coordination.get_coordinator().get_lock(str(liveaction_db.id).encode()): virtualenv/lib/python3.8/site-packages/mock/mock.py line 1178 in __call__ return _mock_self._mock_call(*args, **kwargs) virtualenv/lib/python3.8/site-packages/mock/mock.py line 1182 in _mock_call return _mock_self._execute_mock_call(*args, **kwargs) virtualenv/lib/python3.8/site-packages/mock/mock.py line 1243 in _execute_mock_call raise result ToozConnectionError: foobar

cognifloyd · 2024-09-19T04:13:00Z

st2common/tests/unit/services/test_workflow_service_retries.py

@@ -195,7 +196,6 @@ def test_retries_exhausted_from_coordinator_connection_error(self, mock_get_lock
        "update_task_state",
        mock.MagicMock(
            side_effect=[
-                mongoengine.connection.ConnectionFailure(),


Without this, the side effect happens too much, resulting in this traceback + test error on at least python3.9 (and probably newer):

====================================================================== 1) ERROR: test_recover_from_database_connection_error (tests.unit.services.test_workflow_service_retries.OrquestaServiceRetryTest) ---------------------------------------------------------------------- Traceback (most recent call last): virtualenv/lib/python3.9/site-packages/mock/mock.py line 1452 in patched return func(*newargs, **newkeywargs) st2common/tests/unit/services/test_workflow_service_retries.py line 222 in test_recover_from_database_connection_error wf_svc.handle_action_execution_completion(tk1_ac_ex_db) virtualenv/lib/python3.9/site-packages/retrying.py line 56 in wrapped_f return Retrying(*dargs, **dkw).call(f, *args, **kw) virtualenv/lib/python3.9/site-packages/retrying.py line 266 in call raise attempt.get() virtualenv/lib/python3.9/site-packages/retrying.py line 301 in get six.reraise(self.value[0], self.value[1], self.value[2]) virtualenv/lib/python3.9/site-packages/six.py line 719 in reraise raise value virtualenv/lib/python3.9/site-packages/retrying.py line 251 in call attempt = Attempt(fn(*args, **kwargs), attempt_number, False) st2common/st2common/services/workflows.py line 969 in handle_action_execution_completion update_task_state( virtualenv/lib/python3.9/site-packages/mock/mock.py line 1178 in __call__ return _mock_self._mock_call(*args, **kwargs) virtualenv/lib/python3.9/site-packages/mock/mock.py line 1182 in _mock_call return _mock_self._execute_mock_call(*args, **kwargs) virtualenv/lib/python3.9/site-packages/mock/mock.py line 1243 in _execute_mock_call raise result ConnectionFailure:

cognifloyd · 2024-09-19T04:28:04Z

st2tests/st2tests/config.py

+    redis_host = os.environ.get("ST2TESTS_REDIS_HOST", False)
+    if redis_host:
+        redis_port = os.environ.get("ST2TESTS_REDIS_PORT", "6379")
+        driver = f"redis://{redis_host}:{redis_port}"


ST2TESTS_* is the convention I've been using for vars that configure tests:

st2/st2tests/st2tests/config.py

Lines 89 to 90 in 78c3248

db_name = f"st2-test{os.environ.get('ST2TESTS_PARALLEL_SLOT', '')}"

CONF.set_override(name="db_name", override=db_name, group="database")

st2/st2tests/st2tests/config.py

Lines 107 to 109 in 78c3248

system_user = os.environ.get("ST2TESTS_SYSTEM_USER", "")

if system_user:

CONF.set_override(name="user", override=system_user, group="system_user")

cognifloyd · 2024-09-19T04:31:19Z

.github/workflows/ci.yaml

-      - name: Run Redis Service Container
-        timeout-minutes: 2
-        run: |
-          docker run --rm --detach -p 127.0.0.1:6379:6379/tcp --name redis redis:latest
-          until [ "$(docker inspect -f {{.State.Running}} redis)" == "true" ]; do sleep 0.1; done


We only need one redis container, either started here or in services above. The services approach is a bit nicer because we know that redis is up and responding (thanks to the health check that GHA waits for before continuing), whereas this just makes ssure the container (not redis in the container) is running.

This was also the cause of the conflicts where redis was already in use. Getting rid of this allows us to use the simple redis name for the service container instead of redis-server.

cognifloyd · 2024-09-19T05:04:41Z

The Test / Test (pants runs: pytest) CI failures are not related to this PR. To validate that, I re-ran the success CI in #6244, and now it fails there too. So, it seem to be some GHA issue.

Luckily, the pants workflows are not required to merge PRs (in branch protection rules), so please ignore the failure.

guzzijones · 2024-09-19T12:48:58Z

Thanks for going through this. The noop driver random failures were a pain to deal with.

nzlosh

Thanks, this is great work!

This makes the orquesta workflow consistent with ci workflow changes added in #6245

[Broken] Attempt to add redis and add stdout printing to debug issues

913fd97

pull-request-size bot added the size/L PR that changes 100-499 lines. Requires some effort to review. label Sep 17, 2024

cognifloyd force-pushed the unit-tests-use-redis branch from 2697dea to f2f1a62 Compare September 18, 2024 05:29

guzzijones and others added 11 commits September 18, 2024 10:30

working unit tests

913b5f0

lint fixes

7f50e44

Add redis vars to more Makefile targets

4ca0722

enable redis

c91491d

Revert "enable redis"

668a90e

This reverts commit 820001d.

add back redis

f05f081

already redis running integration job

e647d56

remove bad comment in ci

914b0d8

GHA: always run redis container instead of starting as-needed

009f643

All tests (unit, integration, pack) need redis now.

rename ST2_OVERRIDE_... env vars to follow ST2TESTS_* convention

6750d68

comment out parse_args

70fa0ba

tests_config.parse_args is called when importing st2tests and again in setUpClass. if this is needed, try self.reset()

cognifloyd force-pushed the unit-tests-use-redis branch 3 times, most recently from 8c16324 to f772e71 Compare September 18, 2024 21:11

cognifloyd added 5 commits September 18, 2024 18:56

reset config between actionrunner worker tests

3944b78

DRY config setup in test_worker

7d1b1f4

tests_config.parse_args is required. ugh

847907f

Use RedisDriver instead of NoOpDriver

c35d446

cognifloyd force-pushed the unit-tests-use-redis branch from 9f6f8d0 to 5cfd1ff Compare September 19, 2024 03:33

cognifloyd added 2 commits September 18, 2024 22:53

DRY config setup in test_workflow_engine

223c7a6

fmt

78c3248

cognifloyd force-pushed the unit-tests-use-redis branch 3 times, most recently from ab46c40 to 78c3248 Compare September 19, 2024 04:11

cognifloyd commented Sep 19, 2024

View reviewed changes

add changelog entry

774f45b

cognifloyd added this to the 3.9.0 milestone Sep 19, 2024

cognifloyd added tests service: workflow engine infrastructure: ci/cd labels Sep 19, 2024

cognifloyd requested review from winem, nzlosh, mamercad, rush-skills, guzzijones, amanda11, a team and khushboobhatia01 September 19, 2024 04:53

cognifloyd self-assigned this Sep 19, 2024

cognifloyd marked this pull request as ready for review September 19, 2024 04:54

cognifloyd enabled auto-merge September 19, 2024 05:04

guzzijones approved these changes Sep 19, 2024

View reviewed changes

nzlosh approved these changes Sep 19, 2024

View reviewed changes

cognifloyd merged commit d985b0d into master Sep 19, 2024
31 of 33 checks passed

cognifloyd deleted the unit-tests-use-redis branch September 19, 2024 12:55

cognifloyd added a commit that referenced this pull request Sep 19, 2024

gha: use same approach for redis container start

b50a4c0

This makes the orquesta workflow consistent with ci workflow changes added in #6245

cognifloyd added a commit that referenced this pull request Sep 19, 2024

gha: use same approach for redis container start

a525015

This makes the orquesta workflow consistent with ci workflow changes added in #6245

cognifloyd mentioned this pull request Sep 19, 2024

Update from Mongo 4.4 to Mongo 7 in CI #6246

Merged

cognifloyd added a commit that referenced this pull request Sep 26, 2024

gha: use same approach for redis container start

d6ddfd3

This makes the orquesta workflow consistent with ci workflow changes added in #6245

cognifloyd mentioned this pull request Nov 23, 2024

Make tests safer to run in parallel by changing the Redis key namespace #6283

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make unit tests use redis #6245

Make unit tests use redis #6245

cognifloyd commented Sep 17, 2024 •

edited

Loading

cognifloyd Sep 19, 2024

cognifloyd Sep 19, 2024

cognifloyd Sep 19, 2024

cognifloyd Sep 19, 2024

cognifloyd Sep 19, 2024

cognifloyd Sep 19, 2024

cognifloyd Sep 19, 2024

cognifloyd commented Sep 19, 2024

guzzijones commented Sep 19, 2024

nzlosh left a comment

		# This needs to run before creating FakeConcurrencyApplicator below.
		tests_config.parse_args()

		coordination.ToozConnectionError("foobar"),
		coordination.ToozConnectionError("foobar"),

	db_name = f"st2-test{os.environ.get('ST2TESTS_PARALLEL_SLOT', '')}"
	CONF.set_override(name="db_name", override=db_name, group="database")

	system_user = os.environ.get("ST2TESTS_SYSTEM_USER", "")
	if system_user:
	CONF.set_override(name="user", override=system_user, group="system_user")

Make unit tests use redis #6245

Make unit tests use redis #6245

Conversation

cognifloyd commented Sep 17, 2024 • edited Loading

cognifloyd Sep 19, 2024

Choose a reason for hiding this comment

cognifloyd Sep 19, 2024

Choose a reason for hiding this comment

cognifloyd Sep 19, 2024

Choose a reason for hiding this comment

cognifloyd Sep 19, 2024

Choose a reason for hiding this comment

cognifloyd Sep 19, 2024

Choose a reason for hiding this comment

cognifloyd Sep 19, 2024

Choose a reason for hiding this comment

cognifloyd Sep 19, 2024

Choose a reason for hiding this comment

cognifloyd commented Sep 19, 2024

guzzijones commented Sep 19, 2024

nzlosh left a comment

Choose a reason for hiding this comment

cognifloyd commented Sep 17, 2024 •

edited

Loading