Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redisTestHook,memcachedTestHook: init; prevent hanging Darwin build after test failures #357879

Open
wants to merge 11 commits into
base: staging
Choose a base branch
from

Conversation

ofalvai
Copy link
Contributor

@ofalvai ofalvai commented Nov 21, 2024

Looking at Hydra's "timed out jobs" list, I noticed many Darwin timeouts where the last log lines were all about Redis. It turns out the background Redis job is not terminated correctly on macOS when checkPhase fails (thus postCheck never runs).

I reviewed all Redis test usage in pkgs and added the same workaround, as well as connection retry where it was missing.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@ofborg ofborg bot added the 6.topic: darwin Running or building packages on Darwin label Nov 22, 2024
@ofborg ofborg bot requested a review from MrMebelMan November 22, 2024 08:26
@ofalvai ofalvai force-pushed the darwin-redis-test-timeout branch from 269df0a to 1444c42 Compare November 30, 2024 19:17
@ofalvai ofalvai changed the base branch from master to staging November 30, 2024 19:31
@ofalvai ofalvai marked this pull request as ready for review November 30, 2024 20:49
@ofalvai ofalvai requested a review from a team November 30, 2024 20:50
@afh
Copy link
Member

afh commented Dec 1, 2024

It seems this PR adds almost the same code to the preCheck phase of various packages. Would it be possible and feasible to refactor that code into a script that can be re-used by these packages?

@ofalvai
Copy link
Contributor Author

ofalvai commented Dec 1, 2024

I was thinking the same, but I'm not familiar enough with nixpkgs to know how to approach this. Do you have a specific idea in mind? Maybe a hook?

@ofalvai ofalvai changed the title treewide: prevent hanging Darwin build after test failures various: prevent hanging Darwin build after test failures Dec 5, 2024
@toonn
Copy link
Contributor

toonn commented Dec 28, 2024

I asked in the NixOS dev room and emily suggested a hook indeed. And in addition to mark the failing packages as broken or badPlatforms.

@mweinelt
Copy link
Member

Create a hook to integrate redis into the check phase.

Check out pkgs/by-name/po/postgresqlTestHook which has a similar use case.

@toonn
Copy link
Contributor

toonn commented Dec 28, 2024

Emily also mentioned the following:

I'm also not sure that MAX_RETRIES stuff is a good idea since Hydra builders are often under heavy load and test timeouts usually just break things unnecessarily

I sort of suspect fixing the underlying issue is just a matter of putting a nohup in front of the server call

I'm not sure nohup would prevent the process being assigned to launchd but worth looking into I suppose.

@ofalvai ofalvai force-pushed the darwin-redis-test-timeout branch from 1444c42 to 52fcb42 Compare January 23, 2025 20:27
@github-actions github-actions bot added 8.has: documentation This PR adds or changes documentation and removed 10.rebuild-linux: 501+ labels Jan 23, 2025
@github-actions github-actions bot removed 6.topic: haskell 6.topic: kernel The Linux kernel 8.has: module (update) This PR changes an existing module in `nixos/` 6.topic: vim 6.topic: erlang 6.topic: nodejs 6.topic: coq "A formal proof management system" 6.topic: vscode 6.topic: php 8.has: maintainer-list (update) This PR changes `maintainers/maintainer-list.nix` 6.topic: nvidia 6.topic: xen-project The Xen Project hypervisor labels Jan 24, 2025
@ofborg ofborg bot removed the 2.status: merge conflict This PR has merge conflicts with the target branch label Jan 24, 2025
Copy link
Member

@mweinelt mweinelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that looks pretty good already!

Comment on lines 6 to 19
```nix
{ stdenv, memcachedTestHook }:
stdenv.mkDerivation {
# ...
nativeCheckInputs = [
memcachedTestHook
];
}
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format codeblocks with nixfmt-rfc-style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, as well as the redis hook docs

Comment on lines +31 to +37
Bash-only variables:

- `redisTestPort`: Port to use by Redis. Defaults to `6379`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also expose redis over a unix domain socket?

Comment on lines +31 to +36
Bash-only variables:

- `memcachedTestPort`: Port to use by Memcached. Defaults to `11211`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also expose the unix domain socket.

MEMCACHED_PID=$!

while ! (echo 'quit' | nc localhost "$memcachedTestPort") ; do
echo 'waiting for memcached to be ready'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just print that once, otherwise it could become very spammy on busy builders.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, as well as in redisHook

Comment on lines 10 to 13
propagatedBuildInputs = [
memcached
netcat
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure we want to propagate these packages. We could just use buildInputs, no? Or use replaceVars on the hook script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I pushed a fix using substitutions (makeSetupHook has no buildInputs)

@ofalvai ofalvai force-pushed the darwin-redis-test-timeout branch from b1bfc93 to b397670 Compare February 4, 2025 07:25
@ofborg ofborg bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label Feb 4, 2025
@ofalvai ofalvai force-pushed the darwin-redis-test-timeout branch from b397670 to d267cec Compare February 4, 2025 07:29
@ofalvai
Copy link
Contributor Author

ofalvai commented Feb 4, 2025

Thank you for the review @mweinelt. The easier ones are fixed now. Next, I'll figure out the unix domain socket thing and then resolve the merge conflict.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: merge conflict This PR has merge conflicts with the target branch 6.topic: darwin Running or building packages on Darwin 6.topic: python 8.has: documentation This PR adds or changes documentation 10.rebuild-darwin: 11-100 10.rebuild-linux: 2501-5000
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants