Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report errors to the Rails error reporter #373

Merged
merged 2 commits into from
Nov 15, 2024
Merged

Report errors to the Rails error reporter #373

merged 2 commits into from
Nov 15, 2024

Conversation

npezza93
Copy link
Contributor

@npezza93 npezza93 commented Oct 6, 2024

This was kinda gnarly to track down so apologies in advance for the crazy repro steps.


When running solid queue i noticed errors were not getting reported to the error reporter. I initially thought it was a rails problem but digging in more i noticed they are successfully reported on other adapters(like sidekiq) just not SQ.
So i created this repo to help show off the issue: https://github.com/npezza93/job_error_test

gh repo clone https://github.com/npezza93/job_error_test
cd job_error_test
bundle
bin/rails db:prepare
bin/rails s

bin/rails c
ErrorJob.perform_later

Once you enqueue the job, go to the server and you should hit a bunch of irb bindings.
The first is inside the ActiveSupport reloader. Nothing really to note here but in case you wanted to look around.
The next is inside the ExecutionWrapper. This is where the issue lies. Inside the wrap method is where errors are rescued and then reported. But if you call active? it is true here and so it exits and doesnt rescue any errors. This execution wrapper gets setup inside activejob/lib/active_job/execution.rb and the execute callback is defined in activejob/lib/active_job/railtie.rb:78. (heres a full diff of changes i made to rails: rails/rails@main...npezza93:rails:queue-test)

So from what i can tell wrapping the thread_execution inside Pool with wrap_in_app_executor basically wraps active job twice and so the ExecutionWrapper thinks it's already active.

In that test repo i have my branch on solid queue commented out in the Gemfile. If you uncomment and bundle you can see that active? is now false if you rerun everything.

Im honestly not sure if this wrapping was serving some other purpose that i've now broken. So if it is still needed i think we will need to wrap everything before and everything after ActiveJob::Base.execute(job.arguments) to avoid a double wrapping but let me know.

The errors were getting swallowed which is why the outer executor never saw them. Now i reraise the error when the job fails and then rerescue them on the outside of the executor so the thread doesnt error out. As Jean pointed out "ErrorHandle interface handles repeated reporting correctly" so having the on_thread_error rereport the error is fine as duplicate errors wont be reported.

@npezza93
Copy link
Contributor Author

npezza93 commented Oct 6, 2024

Ill also note the app_executors are the same between the ActiveJob one and the SQ one. So im not entirely sure why the SQ executor never picks up the rescue if the AJ one thinks it's active. I would have expected the error to bubble up but it never does. @byroot if youre in here, maybe you have an idea?

@byroot
Copy link
Member

byroot commented Oct 6, 2024

Isn't it because the call you removed was in a background thread? Hence it's missing lots of context state?

Just a guess, I'd need to dig more as I'm not familiar at all with the SQ codebase.

@npezza93
Copy link
Contributor Author

npezza93 commented Oct 6, 2024

The errors were getting swallowed which is why the outer executor never saw them. Now i reraise the error when the job fails and then rerescue them on the outside of the executor so the thread doesnt error out.

@byroot
Copy link
Member

byroot commented Oct 6, 2024

Ah I see. Perhaps we could evolve Executor#wrap that that when used recursively, the nested call still shortcut, but also still report errors. Especially now that the ErrorHandle interface handles repeated reporting correctly.

@npezza93 npezza93 force-pushed the errors branch 3 times, most recently from 1fe831c to d3a4466 Compare October 6, 2024 17:56
@rosa
Copy link
Member

rosa commented Nov 13, 2024

@npezza93 @byroot, thanks a lot for looking into this, and sorry for the delay! Good catch, I think this makes sense. I think I got confused in the past by looking at what Sentry, Rollbar and other error reporting tools did to report errors with Active Job 🤔 Or perhaps it was something else, but there was a reason not to re-raise there... but this was from the very beginning, so whatever it was, it might not be applicable now. Let me do some tests and review this carefully.

@npezza93
Copy link
Contributor Author

No problem @rosa! I've been testing in prod with this branch and so far so good. Let me know wif you find anything on your end!

Copy link
Member

@rosa rosa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally tested this properly with Sentry and everything is working great. Thank you so much again, @npezza93! 🙇‍♀️ 🙏 💖

@rosa rosa merged commit 42ce2ac into rails:main Nov 15, 2024
4 checks passed
@rosa
Copy link
Member

rosa commented Nov 16, 2024

Published version 1.0.2 with this fix 😊 Thanks again!

adamlogic added a commit to judoscale/judoscale-ruby that referenced this pull request Dec 30, 2024
Errors are no longer swallowed, so our test must rescue them. rails/solid_queue#373
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants