Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call Celery task with transaction.on_commit #73

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions django_yubin/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,15 @@
from email import policy
from email import encoders as Encoders
from email.mime.base import MIMEBase
from functools import partial

from django.core.exceptions import FieldError
from django.core.mail.message import (
ADDRESS_HEADERS,
EmailMessage,
EmailMultiAlternatives,
)
from django.db import models
from django.db import models, transaction
from django.db.models import F
from django.utils.module_loading import import_string
from django.utils.text import Truncator
Expand Down Expand Up @@ -254,7 +255,7 @@ def enqueue(self, log_message=None):
self.mark_as(self.STATUS_QUEUED, log_message)

try:
tasks.send_email.delay(self.pk)
transaction.on_commit(partial(tasks.send_email.delay, message_pk=self.pk))
return True
except Exception as e:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure having an except block makes sense here, now that we have transaction.on_commit? Also, send_email calls send_db_message which is already in an atomic block

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sastred I've discussed this with my colleague a bit - the entire flow now of message queuing and status updates seems to be built on the assumption that send_mail runs in an autocommit block, but this is really a bad assumption.

Especially when sending confirmation emails (e.g. for a webshop), you absolutely do not want to send out an email to confirm the error if the actual order placement fails for whatever reason. Typically you protect against these things by marking the calling code with @transaction.atomic. In the current version this crashes in the celery task because the email message doesn't exist, but that clutters our (Sentry) error monitoring with errors that aren't actually errors but intended behaviour. The signal-to-noise ratio of error monitoring is affected by this.

It would probably make more sense to break up the individual steps in a celery chain:

  1. persist message in the dabatase (could be part of your normal @transaction.atomic)
  2. async mark the message as enqueued (to get logging information even if something goes wrong)
  3. async send the message & log any results of that

Obviously all of this is out of scope for this bugfix, but it'd be good to take a critical look at this implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Viicos and @sergei-maertens

Thanks for the PR. I agree with the solution but it breaks 2 tests: https://github.com/APSL/django-yubin/actions/runs/6796969337/job/18927395933

image

Could you fix them? Probably you will need to patch transaction.on_commit instead of django_yubin.tasks.send_email.delay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be resolved with a self.captureOnCommitCallbacks(execute=True): in the tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried to fix the broken tests but it's not enough, more changes are needed in the code. I have thought again about your comment and I am not sure I understand 100% your problem sending emails. It looks like that you are trying to process orders and send emails in the same database transaction. If that's the case, why not finish order transactions first and if they went well send emails in a second step?

Copy link
Contributor

@sergei-maertens sergei-maertens Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like that you are trying to process orders and send emails in the same database transaction.

Not necessarily, if the database itself is slow and the celery job is picked up by a worker before it is committed to the database (which may be well outside our own control or simply unknown, since django itself has code that runs in atomic transactions, like the built in password reset views!), the e-mail sending task fails, and it does so silently.

There is no retry specified on the task definitions either, so this does not autoheal.

Additionally, we do not want to commit a transaction which for unknown reasons crashes during email scheduling, so moving every send_mail call into a transaction.on_commit callback is not desired. We also require the email scheduling itself to be part of the transaction for audit trail reasons - every email sending call also creates audit trail records to prove and pinpoint for which object an email was scheduled.

I get your perspective of breaking these things out, but unfortunately reality is complexer than that, and the current implementation is simply not guaranteed to be resistent to race-conditions, even with autocommit.

One additional thought that arises while typing this out - what if your own code at some point requires a @transaction.atomic? Say creating the email record, and creating an immediate log record that the mail was scheduled?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sergei-maertens and @Viicos ,

Sorry for taking so long, I have been quite busy these past weeks.

I have been thinking about the simplest solution. I think it can be to send the email on_commit() as you pointed out and move all the database operations to the Celery task (engine.send_db_message()). This way, emails will be sent after all database operations finish successfully and there shouldn't be race conditions.

I have implemented it in the branch https://github.com/APSL/django-yubin/tree/db_transactions-celery-race_condition and It looks like it works well at small scale (unit tests and manual QA).

Do you have a bigger test scenario like the one you described before to test it in a more real scale?

self.date_enqueued = backup['date_enqueued']
Expand Down
Loading