Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pgd/5/consistency/column level conflicts/docs 287 #5848

Merged
merged 32 commits into from
Aug 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
41dc4c0
Split out into files.
jpe442 Jul 3, 2024
516b0a8
Changes in timestamps file
jpe442 Jul 3, 2024
d3c3e4e
Removed DDL locking as stand-alone page.
jpe442 Jul 8, 2024
44a4a78
fixed index links:
jpe442 Jul 8, 2024
dfedb14
Added description to items in index.mdx
jpe442 Jul 8, 2024
4d0c215
Started integrating notes section.
jpe442 Jul 9, 2024
d0ce177
Continue integrating notes.
jpe442 Jul 9, 2024
5cd4fe2
Small changes.
jpe442 Jul 10, 2024
02de691
Removed some notes and integrated some.
jpe442 Jul 10, 2024
addf140
Cleaned up note integration.
jpe442 Jul 10, 2024
29e6142
Moved column_timestamps_create to reference.
jpe442 Jul 10, 2024
bc788c3
Added pages to index.mdx.src
jpe442 Jul 10, 2024
99374b6
Update product_docs/docs/pgd/5/consistency/column-level-conflicts/02_…
jpe442 Jul 12, 2024
65716cc
Fixed links.
jpe442 Jul 12, 2024
4d155ea
Moved crdt bit to overview and added link.
jpe442 Jul 12, 2024
916c2b1
Small changes.
jpe442 Jul 30, 2024
73b39f4
more small changes.
jpe442 Jul 31, 2024
74f3757
Tweaks in light of PR 5813 complete.
jpe442 Jul 31, 2024
1fbe2b1
Small changes.
jpe442 Aug 1, 2024
7f43712
rewrote timestamps
jpe442 Aug 1, 2024
f1cd7fc
Small change.
jpe442 Aug 1, 2024
4e5be3f
Integrated some of DJ's comments from Slack.
jpe442 Aug 1, 2024
1e85d76
Added DJs rewrite of Timestamps in and deleted old Timestamps topic.
jpe442 Aug 2, 2024
ab9b4e2
More tweaks to align with DJ's feedback.
jpe442 Aug 2, 2024
2de40ea
More little changes.
jpe442 Aug 2, 2024
ea6d54c
Final tweaks and removing original clc mdx file.
jpe442 Aug 2, 2024
256a366
Some change in reference index.mdx.
jpe442 Aug 2, 2024
92e36b9
Add note about PGD proxy as a way to avoid conflicts.
jpe442 Aug 2, 2024
4e30002
Merge branch 'develop' into pgd/5/consistency/column-level-conflicts/…
djw-m Aug 6, 2024
9a4ce27
Removed redirect 01_overview_clcd.mdx
djw-m Aug 6, 2024
fff4372
Remove redirect 02_enabling_disabling.mdx
djw-m Aug 6, 2024
bbadc59
Remove redirect 03_timestamps.mdx
djw-m Aug 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
239 changes: 0 additions & 239 deletions product_docs/docs/pgd/5/consistency/column-level-conflicts.mdx

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
navTitle: Overview
title: Overview
---

By default, conflicts are resolved at row level. When changes from two nodes conflict, either the local or remote tuple is selected and the other is discarded. For example, commit timestamps for the two conflicting changes might be compared and the newer one kept. This approach ensures that all nodes converge to the same result and establishes commit-order-like semantics on the whole cluster.
jpe442 marked this conversation as resolved.
Show resolved Hide resolved

However, it might sometimes be appropriate to resolve conflicts at the column level rather than the row level, at least in some cases.

## When to resolve at the column level

Consider a simple example in which table t has two integer columns, a and b, and a single row `(1,1)`. On one node execute:

```sql
UPDATE t SET a = 100
```

On another node, before receiving the preceding `UPDATE`, concurrently execute:

```sql
UPDATE t SET b = 100
```

!!! Note
The attributes modified by an `UPDATE` are determined by comparing the old and new row in a trigger. This means that if the attribute doesn't change a value, it isn't detected as modified even if it's explicitly set. For example, `UPDATE t SET a = a` doesn't mark `a` as modified for any row. Similarly, `UPDATE t SET a = 1` doesn't mark `a` as modified for rows that are already set to `1`.
!!!

This sequence results in an `UPDATE-UPDATE` conflict. With the [`update_if_newer`](../../reference/conflicts/#default-conflict-resolvers) conflict resolution, the commit timestamps are compared, and the new row version is kept. Assuming the second node committed last, the result is `(1,100)`, which effectively discards the change to column a.

For many use cases, this behavior is desired and expected. However, for some use cases, this might be an issue. Consider, for example, a multi-node cluster where each part of the application is connected to a different node, updating a dedicated subset of columns in a shared table. In that case, the different components might conflict and overwrite changes.

For such use cases, it might be more appropriate to resolve conflicts on a given table at the column level. To achieve that, PGD tracks the timestamp of the last change for each column separately and uses that to pick the most recent value, essentially performing `update_if_newer`.

Applied to the previous example, the result is `(100,100)` on both nodes, despite neither of the nodes ever seeing such a row.

When thinking about column-level conflict resolution, it can be useful to see tables as vertically partitioned, so that each update affects data in only one slice. This approach eliminates conflicts between changes to different subsets of columns. In fact, vertical partitioning can even be a practical alternative to column-level conflict resolution.

Column-level conflict resolution requires the table to have `REPLICA IDENTITY FULL`. The [bdr.alter_table_conflict_detection()](https://www.enterprisedb.com/docs/pgd/latest/reference/conflict_functions#bdralter_table_conflict_detection) function checks that and fails with an error if this setting is missing.

## Special problems for column-level conflict resolution

By treating the columns independently, it's easy to violate constraints in a way that isn't possible when all changes happen on the same node. Consider, for example, a table like this:

```sql
CREATE TABLE t (id INT PRIMARY KEY, a INT, b INT, CHECK (a > b));
INSERT INTO t VALUES (1, 1000, 1);
```

Assume one node does:

```sql
UPDATE t SET a = 100;
```

Another node concurrently does:

```sql
UPDATE t SET b = 500;
```

Each of those updates is valid when executed on the initial row and so passes on each node. But when replicating to the other node, the resulting row violates the `CHECK (a > b)` constraint, and the replication stops until the issue is resolved manually.

## Handling column-level conflicts using CRDT data types

By default, column-level conflict resolution picks the value with a higher timestamp and discards the other one. You can, however, reconcile the conflict in different, more elaborate ways. For example, you can use [CRDT types](../crdt) that allow merging the conflicting values without discarding any information.
Loading
Loading