-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call Celery task with transaction.on_commit
#73
Conversation
@@ -254,7 +255,7 @@ def enqueue(self, log_message=None): | |||
self.mark_as(self.STATUS_QUEUED, log_message) | |||
|
|||
try: | |||
tasks.send_email.delay(self.pk) | |||
transaction.on_commit(partial(tasks.send_email.delay, message_pk=self.pk)) | |||
return True | |||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure having an except
block makes sense here, now that we have transaction.on_commit
? Also, send_email
calls send_db_message
which is already in an atomic block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sastred I've discussed this with my colleague a bit - the entire flow now of message queuing and status updates seems to be built on the assumption that send_mail
runs in an autocommit block, but this is really a bad assumption.
Especially when sending confirmation emails (e.g. for a webshop), you absolutely do not want to send out an email to confirm the error if the actual order placement fails for whatever reason. Typically you protect against these things by marking the calling code with @transaction.atomic
. In the current version this crashes in the celery task because the email message doesn't exist, but that clutters our (Sentry) error monitoring with errors that aren't actually errors but intended behaviour. The signal-to-noise ratio of error monitoring is affected by this.
It would probably make more sense to break up the individual steps in a celery chain:
- persist message in the dabatase (could be part of your normal @transaction.atomic)
- async mark the message as enqueued (to get logging information even if something goes wrong)
- async send the message & log any results of that
Obviously all of this is out of scope for this bugfix, but it'd be good to take a critical look at this implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Viicos and @sergei-maertens
Thanks for the PR. I agree with the solution but it breaks 2 tests: https://github.com/APSL/django-yubin/actions/runs/6796969337/job/18927395933
Could you fix them? Probably you will need to patch transaction.on_commit
instead of django_yubin.tasks.send_email.delay
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this be resolved with a self.captureOnCommitCallbacks(execute=True):
in the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to fix the broken tests but it's not enough, more changes are needed in the code. I have thought again about your comment and I am not sure I understand 100% your problem sending emails. It looks like that you are trying to process orders and send emails in the same database transaction. If that's the case, why not finish order transactions first and if they went well send emails in a second step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like that you are trying to process orders and send emails in the same database transaction.
Not necessarily, if the database itself is slow and the celery job is picked up by a worker before it is committed to the database (which may be well outside our own control or simply unknown, since django itself has code that runs in atomic transactions, like the built in password reset views!), the e-mail sending task fails, and it does so silently.
There is no retry
specified on the task definitions either, so this does not autoheal.
Additionally, we do not want to commit a transaction which for unknown reasons crashes during email scheduling, so moving every send_mail
call into a transaction.on_commit
callback is not desired. We also require the email scheduling itself to be part of the transaction for audit trail reasons - every email sending call also creates audit trail records to prove and pinpoint for which object an email was scheduled.
I get your perspective of breaking these things out, but unfortunately reality is complexer than that, and the current implementation is simply not guaranteed to be resistent to race-conditions, even with autocommit.
One additional thought that arises while typing this out - what if your own code at some point requires a @transaction.atomic
? Say creating the email record, and creating an immediate log record that the mail was scheduled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sergei-maertens and @Viicos ,
Sorry for taking so long, I have been quite busy these past weeks.
I have been thinking about the simplest solution. I think it can be to send the email on_commit()
as you pointed out and move all the database operations to the Celery task (engine.send_db_message()
). This way, emails will be sent after all database operations finish successfully and there shouldn't be race conditions.
I have implemented it in the branch https://github.com/APSL/django-yubin/tree/db_transactions-celery-race_condition and It looks like it works well at small scale (unit tests and manual QA).
Do you have a bigger test scenario like the one you described before to test it in a more real scale?
Hi - no worries, we all have reality hitting us :)
We've been running the patch in this PR in production by patching our
docker image build after the requirements install and it has solved our
issues.
So, if the alternative fix follows the same principles, then I expect a
smooth transition.
We can of course install a beta or pre-release and do some manual testing
too on an environment if you'd prefer that.
…On Fri, Dec 22, 2023, 15:43 Daniel Sastre ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In django_yubin/models.py
<#73 (comment)>:
> @@ -254,7 +255,7 @@ def enqueue(self, log_message=None):
self.mark_as(self.STATUS_QUEUED, log_message)
try:
- tasks.send_email.delay(self.pk)
+ transaction.on_commit(partial(tasks.send_email.delay, message_pk=self.pk))
return True
except Exception as e:
Hi @sergei-maertens <https://github.com/sergei-maertens> and @Viicos
<https://github.com/Viicos> ,
Sorry for taking so long, I have been quite busy these past weeks.
I have been thinking about the simplest solution. I think it can be to
send the email on_commit() as you pointed out and move all the database
operations to the Celery task (engine.send_db_message()). This way,
emails will be sent after all database operations finish successfully and
there shouldn't be race conditions.
I have implemented it in the branch
https://github.com/APSL/django-yubin/tree/db_transactions-celery-race_condition
and It looks like it works well at small scale (unit tests and manual QA).
Do you have a bigger test scenario like the one you described before to
test it in a more real scale?
—
Reply to this email directly, view it on GitHub
<#73 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKDJVWIMNSDQ7ZT6EOH2GTYKWMCNAVCNFSM6AAAAAA7CUMBEWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTOOJUGU4DEMJWGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Fixes #72.