Skip to content

Commit

Permalink
🐛 [#5058] Fix race conditions leading to database errors
Browse files Browse the repository at this point in the history
**Removed the submission value variable recoupling**

Before this patch, we ran some reconciliation when a form is edited to
ensure that existing SubmissionValueVariable instances don't have
'null' form_variable foreign keys because the FormVariable set of a
form was updated after editing (a) step(s) - the idea here was to be
eventually consistent. This turns out not to be necessary (if we can
trust our test suite) because the load_submission_value_variables_state
method on the Submission operates on the variable keys rather than
the FK relation and is able to properly resolve everything. This is
also the interface that developers should use when accessing the
submission values and it appears to be done properly in the
registration backends, otherwise tests would likely fail.

This re-coupling was extended in #4900, after it was noticed in #4824
that the re-coupling didn't happen for other forms that use the same
re-usable form definition. At the time, we didn't understand how this
seemingly didn't cause issues or at least didn't result in issues
being reported to us, but we can now conclude that it just wasn't a
problem in the first place because the proper interfaces/service layer
are/were being used and everything is done/reconciled in-memory when
comparing/populating submission variables with form variables.

A further cleanup step will then also be to remove this FK field from
the submission value variable model, as it is not necessary.

**Run form variable sync in same transaction**

Offloading this to celery results in more, separate tasks each competing
for database locks and can lead to integrity errors.

Changing the business logic of form variable creation/update to
make sure a FormDefinition write results in a sync action gives
a single trigger for these kind of changes, and it's responsible
for updating all the forms that are affected by it.

Since multiple FormDefinition/FormStep writes happen in parallel
because of parallel client requests, we can only update or
create variables and can't delete variables that are not present
in our current form definition, as they may belong to another
request. This shouldn't immediately cause problems, as old,
unused variables don't have an 'execution path'. We piggy
back on the variables bulk update endpoint/transaction to
clean these up, as that call is made after all the form step
persistence succeeded so we know that everything is resolved
by then.

The left-over variables problem only exists when updating a
form step to use a different form definition. It's not
relevant when a form definition is updated through the
admin or a form step is added to a form.

**Add test for expected new behaviour of bulk variable update endpoint**

The bulk update endpoint may no longer manage the variables for form
steps, as that's taken care of by the form step endpoint now. But,
user defined variables must still be managed.

**Use the form variables bulk update for consistency**

Instead of triggering celery tasks, perform the variable state
synchronization for a form in the bulk update endpoint. This
endpoint now still validates the state of all component/user
defined variables, but no longer drops all the variables and
re-creates them, instead it only touches user defined
variables and leaves the form step variables alone.

Since this is called after the steps have been sorted out,
a complete view of left-over variables from earlier
form definitions is available and those can be cleaned
up safely now.

Backport-of: #5064
  • Loading branch information
sergei-maertens committed Jan 31, 2025
1 parent fc3bacd commit 4b486c2
Show file tree
Hide file tree
Showing 12 changed files with 526 additions and 819 deletions.
6 changes: 5 additions & 1 deletion src/openforms/api/serializers.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
from typing import Any

from django.utils.module_loading import import_string
from django.utils.translation import gettext_lazy as _

Expand Down Expand Up @@ -69,6 +71,7 @@ class ValidationErrorSerializer(ExceptionSerializer):

class ListWithChildSerializer(serializers.ListSerializer):
child_serializer_class = None # class or dotted import path
bulk_create_kwargs: dict[str, Any] | None = None

def __init__(self, *args, **kwargs):
child_serializer_class = self.get_child_serializer_class()
Expand All @@ -94,7 +97,8 @@ def create(self, validated_data):
obj = model(**data_dict)
objects_to_create.append(self.process_object(obj))

return model._default_manager.bulk_create(objects_to_create)
kwargs = self.bulk_create_kwargs or {}
return model._default_manager.bulk_create(objects_to_create, **kwargs)


class PublicFieldsSerializerMixin:
Expand Down
7 changes: 6 additions & 1 deletion src/openforms/forms/admin/form_definition.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from django.contrib import admin, messages
from django.contrib.admin.actions import delete_selected as _delete_selected
from django.db.models import Prefetch
from django.http import HttpRequest
from django.urls import reverse
from django.utils.html import format_html, format_html_join
from django.utils.translation import gettext_lazy as _
Expand All @@ -9,7 +10,7 @@

from ...utils.expressions import FirstNotBlank
from ..forms import FormDefinitionForm
from ..models import FormDefinition, FormStep
from ..models import FormDefinition, FormStep, FormVariable
from .mixins import FormioConfigMixin


Expand Down Expand Up @@ -100,3 +101,7 @@ def used_in_forms(self, obj) -> str:
return format_html("<ul>{}</ul>", ret)

used_in_forms.short_description = _("used in")

def save_model(self, request: HttpRequest, obj: FormDefinition, form, change):
super().save_model(request=request, obj=obj, form=form, change=change)
FormVariable.objects.synchronize_for(obj)
12 changes: 4 additions & 8 deletions src/openforms/forms/api/serializers/form_step.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
from functools import partial

from django.db import transaction
from django.db.models import Q

Expand All @@ -12,8 +10,7 @@
ModelTranslationsSerializer,
)

from ...models import FormDefinition, FormStep
from ...tasks import on_formstep_save_event
from ...models import FormDefinition, FormStep, FormVariable
from ...validators import validate_no_duplicate_keys_across_steps
from ..validators import FormStepIsApplicableIfFirstValidator
from .button_text import ButtonTextSerializer
Expand Down Expand Up @@ -181,12 +178,11 @@ def save(self, **kwargs):
Bandaid fix for #4824
Ensure that the FormVariables are in line with the state of the FormDefinitions
after saving
after saving.
"""
instance = super().save(**kwargs)

transaction.on_commit(
partial(on_formstep_save_event, instance.form.id, countdown=60)
)
# call this synchronously so that it's part of the same DB transaction.
FormVariable.objects.synchronize_for(instance.form_definition)

return instance
45 changes: 36 additions & 9 deletions src/openforms/forms/api/serializers/form_variable.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,21 +16,48 @@
from ...models import Form, FormDefinition, FormVariable


def save_fetch_config(data):
if config := data.get("service_fetch_configuration"):
config.save()
data["service_fetch_configuration"] = config.instance
return data


class FormVariableListSerializer(ListWithChildSerializer):
bulk_create_kwargs = {
"update_conflicts": True,
"update_fields": (
"name",
"key",
"source",
"service_fetch_configuration",
"prefill_plugin",
"prefill_attribute",
"prefill_identifier_role",
"prefill_options",
"data_type",
"data_format",
"is_sensitive_data",
"initial_value",
),
"unique_fields": ("form", "key"),
}

def get_child_serializer_class(self):
return FormVariableSerializer

def process_object(self, variable: FormVariable):
variable.check_data_type_and_initial_value()
return variable
def process_object(self, obj: FormVariable):
obj.check_data_type_and_initial_value()
return obj

def preprocess_validated_data(self, validated_data):
def save_fetch_config(data):
if config := data.get("service_fetch_configuration"):
config.save()
data["service_fetch_configuration"] = config.instance
return data

# only process not-component variables, as those are managed via the form
# step endpoint
validated_data = [
item
for item in validated_data
if (item["source"] != FormVariableSources.component)
]
return map(save_fetch_config, validated_data)

def validate(self, attrs):
Expand Down
23 changes: 14 additions & 9 deletions src/openforms/forms/api/viewsets.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import inspect
from functools import partial
from uuid import UUID

from django.conf import settings
Expand Down Expand Up @@ -28,7 +27,6 @@

from ..messages import add_success_message
from ..models import Form, FormDefinition, FormStep, FormVersion
from ..tasks import on_variables_bulk_update_event
from ..utils import export_form, import_form
from .datastructures import FormVariableWrapper
from .documentation import get_admin_fields_markdown
Expand Down Expand Up @@ -493,10 +491,6 @@ def admin_message(self, request, *args, **kwargs):
@transaction.atomic
def variables_bulk_update(self, request, *args, **kwargs):
form = self.get_object()
form_variables = form.formvariable_set.all()
# We expect that all the variables that should be associated with a form come in the request.
# So we can delete any existing variables because they will be replaced.
form_variables.delete()

serializer = FormVariableSerializer(
data=request.data,
Expand All @@ -515,9 +509,20 @@ def variables_bulk_update(self, request, *args, **kwargs):
},
)
serializer.is_valid(raise_exception=True)
serializer.save()

transaction.on_commit(partial(on_variables_bulk_update_event, form.id))
variables = serializer.save()

# clean up the stale variables:
# 1. component variables that are no longer related to the form
stale_component_vars = form.formvariable_set.exclude(
form_definition__formstep__form=form
).filter(source=FormVariableSources.component)
stale_component_vars.delete()
# 2. User defined variables not present in the submitted variables
keys_to_keep = [variable.key for variable in variables]
stale_user_defined = form.formvariable_set.filter(
source=FormVariableSources.user_defined
).exclude(key__in=keys_to_keep)
stale_user_defined.delete()

return Response(serializer.data, status=status.HTTP_200_OK)

Expand Down
116 changes: 112 additions & 4 deletions src/openforms/forms/models/form_variable.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
from typing import TYPE_CHECKING
from __future__ import annotations

from copy import deepcopy
from typing import TYPE_CHECKING, ClassVar

from django.db import models, transaction
from django.db.models import CheckConstraint, Q
Expand Down Expand Up @@ -33,9 +36,9 @@ def create_for_form(self, form: "Form") -> None:
form_steps = form.formstep_set.select_related("form_definition")

for form_step in form_steps:
self.create_for_formstep(form_step)
self.synchronize_for(form_step.form_definition)

def create_for_formstep(self, form_step: "FormStep") -> list["FormVariable"]:
def create_for_formstep(self, form_step: FormStep) -> list[FormVariable]:
form_definition_configuration = form_step.form_definition.configuration
component_keys = [
component["key"]
Expand Down Expand Up @@ -95,6 +98,109 @@ def create_for_formstep(self, form_step: "FormStep") -> list["FormVariable"]:

return self.bulk_create(form_variables)

@transaction.atomic
def synchronize_for(self, form_definition: FormDefinition):
"""
Synchronize the form variables for a given form definition.
This creates, updates and/or removes form variables related to the provided
:class:`FormDefinition` instance. It needs to be called whenever the Formio
configuration of a form definition is changed so that our form variables in each
form making use of the form definition accurately reflect the configuration.
Note that we *don't* remove variables related to other form definitions, as
multiple isolated transactions for different form definitions can happen at the
same time.
"""
# Build the desired state
desired_variables: list[FormVariable] = []
# XXX: looping over the configuration_wrapper is not (yet) viable because it
# also yields the components nested inside edit grids, which we need to ignore.
# So, we stick to iter_components. Performance wise this should be okay since we
# only need to do one pass.
configuration = form_definition.configuration
for component in iter_components(configuration=configuration, recursive=True):
# we need to ignore components that don't actually hold any values - there's
# no point to create variables for those.
if is_layout_component(component):
continue
if component["type"] in ("content", "softRequiredErrors"):
continue
if component_in_editgrid(configuration, component):
continue

# extract options from the component
prefill_plugin = glom(component, "prefill.plugin", default="") or ""
prefill_attribute = glom(component, "prefill.attribute", default="") or ""
prefill_identifier_role = glom(
component, "prefill.identifierRole", default=IdentifierRoles.main
)

desired_variables.append(
self.model(
form=None, # will be set later when visiting all affected forms
key=component["key"],
form_definition=form_definition,
prefill_plugin=prefill_plugin,
prefill_attribute=prefill_attribute,
prefill_identifier_role=prefill_identifier_role,
name=component.get("label") or component["key"],
is_sensitive_data=component.get("isSensitiveData", False),
source=FormVariableSources.component,
data_type=get_component_datatype(component),
initial_value=get_component_default_value(component),
)
)

desired_keys = [variable.key for variable in desired_variables]

# if the Formio configuration of the form definition itself is updated and
# components have been removed or their keys have changed, we know for certain
# we can discard those old form variables - it doesn't matter which form they
# belong to.
stale_variables = self.filter(form_definition=form_definition).exclude(
key__in=desired_keys
)
stale_variables.delete()

# check which form (steps) are affected and patch them up. It is irrelevant whether
# the form definition is re-usable or not, though semantically at most one form step
# should be found for single-use form definitions.
# fmt: off
affected_form_steps = (
form_definition
.formstep_set # pyright: ignore[reportAttributeAccessIssue]
.select_related("form")
)
# fmt: on

# Finally, collect all the instances and efficiently upsert them - creating missing
# variables and updating existing variables in a single query.
to_upsert: list[FormVariable] = []
for step in affected_form_steps:
for variable in desired_variables:
form_specific_variable = deepcopy(variable)
form_specific_variable.form = step.form
to_upsert.append(form_specific_variable)

self.bulk_create(
to_upsert,
# enables UPSERT behaviour so that existing records get updated and missing
# records inserted
update_conflicts=True,
update_fields=(
"prefill_plugin",
"prefill_attribute",
"prefill_identifier_role",
"name",
"is_sensitive_data",
"source",
"data_type",
"initial_value",
),
unique_fields=("form", "key"),
)


class FormVariable(models.Model):
form = models.ForeignKey(
Expand Down Expand Up @@ -187,7 +293,9 @@ class FormVariable(models.Model):
null=True,
)

objects = FormVariableManager()
objects: ClassVar[ # pyright: ignore[reportIncompatibleVariableOverride]
FormVariableManager
] = FormVariableManager()

class Meta:
verbose_name = _("Form variable")
Expand Down
Loading

0 comments on commit 4b486c2

Please sign in to comment.