From 41dc4c0831246339c871c6c3be4c8ff7dcc8f039 Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Wed, 3 Jul 2024 10:49:30 -0400 Subject: [PATCH 01/31] Split out into files. --- .../01_overview_clcd.mdx | 34 +++++++++ .../02_enabling_disabling.mdx | 75 +++++++++++++++++++ .../column-level-conflicts/03_ddl_locking.mdx | 6 ++ .../column-level-conflicts/04_timestamps.mdx | 0 .../column-level-conflicts/05_clc_crdt.mdx | 3 + .../column-level-conflicts/index.mdx | 6 ++ 6 files changed, 124 insertions(+) create mode 100644 product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx create mode 100644 product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx create mode 100644 product_docs/docs/pgd/5/consistency/column-level-conflicts/03_ddl_locking.mdx create mode 100644 product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx create mode 100644 product_docs/docs/pgd/5/consistency/column-level-conflicts/05_clc_crdt.mdx create mode 100644 product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx new file mode 100644 index 00000000000..0da69565f20 --- /dev/null +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx @@ -0,0 +1,34 @@ +--- +navTitle: Overview +title: Overview +redirects: + - /pgd/latest/bdr/column-level-conflicts/ +--- + +By default, conflicts are resolved at row level. When changes from two nodes conflict, either the local or remote tuple is selected and the other is discarded. For example, commit timestamps for the two conflicting changes might be compared and the newer one kept. This approach ensures that all nodes converge to the same result and establishes commit-order-like semantics on the whole cluster. + +However, in some cases it might be appropriate to resolve conflicts at the column level rather than the row level. + +Consider a simple example, in which table t has two integer columns a and b and a single row `(1,1)`. On one node execute: + +```sql +UPDATE t SET a = 100 +``` + +On another node, before receiving the preceding `UPDATE`, concurrently execute: + +```sql +UPDATE t SET b = 100 +``` + +This sequence results in an `UPDATE-UPDATE` conflict. With the `update_if_newer` conflict resolution, the commit timestamps are compared, and the new row version is kept. Assuming the second node committed last, the result is `(1,100)`, which effectively discards the change to column a. + +For many use cases, this behavior is the desired and expected. However, for some use cases, this might be an issue. Consider, for example, a multi-node cluster where each part of the application is connected to a different node, updating a dedicated subset of columns in a shared table. In that case, the different components might conflict and overwrite changes. + +For such use cases, it might be more appropriate to resolve conflicts on a given table at the column level. To achieve that, PGD tracks the timestamp of the last change for each column separately and uses that to pick the most recent value, essentially performing `update_if_newer`. + +Applied to the previous example, the result is `(100,100)` on both nodes, despite neither of the nodes ever seeing such a row. + +When thinking about column-level conflict resolution, it can be useful to see tables as vertically partitioned, so that each update affects data in only one slice. This approach eliminates conflicts between changes to different subsets of columns. In fact, vertical partitioning can even be a practical alternative to column-level conflict resolution. + +Column-level conflict resolution requires the table to have `REPLICA IDENTITY FULL`. The `bdr.alter_table_conflict_detection` function checks that and fails with an error if this setting is missing. \ No newline at end of file diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx new file mode 100644 index 00000000000..a4eacc6df13 --- /dev/null +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx @@ -0,0 +1,75 @@ +--- +navTitle: Enabling and disabling +title: Enabling and disabling +redirects: + - /pgd/latest/bdr/column-level-conflicts/ +--- + +## Enabling and disabling column-level conflict resolution + +!!! Note Permissions required +Column-level conflict detection uses the `column_timestamps` type. This type requires any user needing to detect column-level conflicts to have at least the [bdr_application](../security/pgd-predefined-roles/#bdr_application) role assigned. +!!! + +The [bdr.alter_table_conflict_detection()](conflicts#bdralter_table_conflict_detection) function manages column-level conflict resolution. + +### Example + +This example creates a table `test_table` and then enables column-level conflict resolution on it: + +```sql +db=# CREATE TABLE my_app.test_table (id SERIAL PRIMARY KEY, val INT); +CREATE TABLE + +db=# ALTER TABLE my_app.test_table REPLICA IDENTITY FULL; +ALTER TABLE + +db=# SELECT bdr.alter_table_conflict_detection( +db(# 'my_app.test_table'::regclass, +db(# 'column_modify_timestamp', 'cts'); + alter_table_conflict_detection +-------------------------------- + t + +db=# \d my_app.test_table +``` + +The function adds a `cts` column as specified in the function call. It also creates two triggers (`BEFORE INSERT` and `BEFORE UPDATE`) that are responsible for maintaining timestamps in the new column before each change. + +The new column specifies `NOT NULL` with a default value, which means that `ALTER TABLE ... ADD COLUMN` doesn't perform a table rewrite. + +!!! Note + Avoid using columns with the `bdr.column_timestamps` data type for other purposes, as doing so can have negative effects. For example, it switches the table to column-level conflict resolution, which doesn't work correctly without the triggers. + +### Listing table with column-level conflict resolution + +You can list tables having column-level conflict resolution enabled with the following query. This query detects the presence of a column of type `bdr.column_timestamp`. + +```sql +SELECT nc.nspname, c.relname +FROM pg_attribute a +JOIN (pg_class c JOIN pg_namespace nc ON c.relnamespace = nc.oid) + ON a.attrelid = c.oid +JOIN (pg_type t JOIN pg_namespace nt ON t.typnamespace = nt.oid) + ON a.atttypid = t.oid +WHERE NOT pg_is_other_temp_schema(nc.oid) + AND nt.nspname = 'bdr' + AND t.typname = 'column_timestamps' + AND NOT a.attisdropped + AND c.relkind IN ('r', 'v', 'f', 'p'); +``` + +### bdr.column_timestamps_create + +This function creates column-level conflict resolution. It's called within `column_timestamp_enable`. + +#### Synopsis + +```sql +bdr.column_timestamps_create(p_source cstring, p_timestamp timestampstz) +``` + +#### Parameters + +- `p_source` — The two options are `current` or `commit`. +- `p_timestamp` — Timestamp depends on the source chosen. If `commit`, then `TIMESTAMP_SOURCE_COMMIT`. If `current`, then `TIMESTAMP_SOURCE_CURRENT`. \ No newline at end of file diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_ddl_locking.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_ddl_locking.mdx new file mode 100644 index 00000000000..9f1889d16b7 --- /dev/null +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_ddl_locking.mdx @@ -0,0 +1,6 @@ +--- +navTitle: DDL locking +title: DDL locking +redirects: + - /pgd/latest/bdr/column-level-conflicts/ +--- \ No newline at end of file diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx new file mode 100644 index 00000000000..e69de29bb2d diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/05_clc_crdt.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/05_clc_crdt.mdx new file mode 100644 index 00000000000..4a97dc93dfa --- /dev/null +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/05_clc_crdt.mdx @@ -0,0 +1,3 @@ +## Handling column conflicts using CRDT data types + +By default, column-level conflict resolution picks the value with a higher timestamp and discards the other one. You can, however, reconcile the conflict in different, more elaborate ways. For example, you can use CRDT types that allow merging the conflicting values without discarding any information. \ No newline at end of file diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx new file mode 100644 index 00000000000..5a45132b11c --- /dev/null +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx @@ -0,0 +1,6 @@ +--- +navTitle: Column-level conflict resolution +title: Column-level conflict detection +redirects: + - /pgd/latest/bdr/column-level-conflicts/ +--- \ No newline at end of file From 516b0a8b7e4b52dc067881d5522564072d8cbdf7 Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Wed, 3 Jul 2024 10:52:08 -0400 Subject: [PATCH 02/31] Changes in timestamps file --- .../column-level-conflicts/04_timestamps.mdx | 81 +++++++++++++++++++ 1 file changed, 81 insertions(+) diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx index e69de29bb2d..b16a01c343c 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx @@ -0,0 +1,81 @@ +## Current versus commit timestamp + +An important decision is the timestamp to assign to modified columns. + +By default, the timestamp assigned to modified columns is the current timestamp, as if obtained from `clock_timestamp`. This is simple, and for many cases it is correct (for example, when the conflicting rows modify non-overlapping subsets of columns). + +It can, however, have various unexpected effects: + +- The timestamp changes during statement execution. So, if an `UPDATE` affects multiple rows, each gets a slightly different timestamp. This means that the effects of concurrent changes might get "mixed" in various ways, depending on how the changes performed on different nodes interleave. + +- The timestamp is unrelated to the commit timestamp. Using it to resolve conflicts means that the result isn't equivalent to the commit order, which means it likely can't be serialized. + +!!! Note + Statement and transaction timestamps might be added in the future, which will address issues with mixing effects of concurrent statements or transactions. Still, neither of these options can ever produce results equivalent to commit order. + +You can also use the actual commit timestamp, although this feature is considered experimental. To use the commit timestamp, set the last parameter to `true` when enabling column-level conflict resolution: + +```sql +SELECT bdr.column_timestamps_enable('test_table'::regclass, 'cts', true); +``` + +You can disable it using `bdr.column_timestamps_disable`. + +Commit timestamps currently have restrictions that are explained in [Notes](#notes). + +## Inspecting column timestamps + +The column storing timestamps for modified columns is maintained by triggers. Don't modify it directly. It can be useful to inspect the current timestamps value, for example, while investigating how a conflict was resolved. + +Three functions are useful for this purpose: + +- `bdr.column_timestamps_to_text(bdr.column_timestamps)` + + This function returns a human-readable representation of the timestamp mapping and + is used when casting the value to `text`: + +```sql +db=# select cts::text from test_table; + cts +----------------------------------------------------------------------------------------------------- + {source: current, default: 2018-09-23 19:24:52.118583+02, map: [2 : 2018-09-23 19:25:02.590677+02]} +(1 row) + +``` + +- `bdr.column_timestamps_to_jsonb(bdr.column_timestamps)` + + This function turns a JSONB representation of the timestamps mapping and is used + when casting the value to `jsonb`: + +```sql +db=# select jsonb_pretty(cts::jsonb) from test_table; + jsonb_pretty +--------------------------------------------------- + { + + "map": { + + "2": "2018-09-23T19:24:52.118583+02:00" + + }, + + "source": "current", + + "default": "2018-09-23T19:24:52.118583+02:00"+ + } +(1 row) +``` + +- `bdr.column_timestamps_resolve(bdr.column_timestamps, xid)` + + This function updates the mapping with the commit timestamp for the attributes modified by the most recent transaction if it already committed. This matters only when using the commit timestamp. For example, in this case, the last transaction updated the second attribute (with `attnum = 2`): + +```sql +test=# select cts::jsonb from test_table; + cts +---------------------------------------------------------------------------------------------------------------------------------------- + {"map": {"2": "2018-09-23T19:29:55.581823+02:00"}, "source": "commit", "default": "2018-09-23T19:29:55.581823+02:00", "modified": [2]} +(1 row) + +db=# select bdr.column_timestamps_resolve(cts, xmin)::jsonb from test_table; + column_timestamps_resolve +----------------------------------------------------------------------------------------------------------------------- + {"map": {"2": "2018-09-23T19:29:55.581823+02:00"}, "source": "commit", "default": "2018-09-23T19:29:55.581823+02:00"} +(1 row) +``` \ No newline at end of file From d3c3e4ebec60b530c2b259fade0ca977149c6826 Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Mon, 8 Jul 2024 14:07:56 -0400 Subject: [PATCH 03/31] Removed DDL locking as stand-alone page. --- .../column-level-conflicts/02_enabling_disabling.mdx | 6 +++++- .../consistency/column-level-conflicts/03_ddl_locking.mdx | 6 ------ .../5/consistency/column-level-conflicts/04_timestamps.mdx | 7 +++++++ .../5/consistency/column-level-conflicts/05_clc_crdt.mdx | 7 ++++++- 4 files changed, 18 insertions(+), 8 deletions(-) delete mode 100644 product_docs/docs/pgd/5/consistency/column-level-conflicts/03_ddl_locking.mdx diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx index a4eacc6df13..c59a7839c2a 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx @@ -72,4 +72,8 @@ bdr.column_timestamps_create(p_source cstring, p_timestamp timestampstz) #### Parameters - `p_source` — The two options are `current` or `commit`. -- `p_timestamp` — Timestamp depends on the source chosen. If `commit`, then `TIMESTAMP_SOURCE_COMMIT`. If `current`, then `TIMESTAMP_SOURCE_CURRENT`. \ No newline at end of file +- `p_timestamp` — Timestamp depends on the source chosen. If `commit`, then `TIMESTAMP_SOURCE_COMMIT`. If `current`, then `TIMESTAMP_SOURCE_CURRENT`. + +!!! Note +When enabling or disabling column timestamps on a table, the code uses DDL locking to ensure that there are no pending changes from before the switch. This approach ensures only conflicts with timestamps in both tuples or in neither of them are seen. Otherwise, the code might unexpectedly see timestamps in the local tuple and NULL in the remote one. It also ensures that the changes are resolved the same way (column-level or row-level) on all nodes. +!!! \ No newline at end of file diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_ddl_locking.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_ddl_locking.mdx deleted file mode 100644 index 9f1889d16b7..00000000000 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_ddl_locking.mdx +++ /dev/null @@ -1,6 +0,0 @@ ---- -navTitle: DDL locking -title: DDL locking -redirects: - - /pgd/latest/bdr/column-level-conflicts/ ---- \ No newline at end of file diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx index b16a01c343c..15c44b2106e 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx @@ -1,3 +1,10 @@ +--- +navTitle: Timestamps +title: Timestamps +redirects: + - /pgd/latest/bdr/column-level-conflicts/ +--- + ## Current versus commit timestamp An important decision is the timestamp to assign to modified columns. diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/05_clc_crdt.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/05_clc_crdt.mdx index 4a97dc93dfa..fc89fa290e7 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/05_clc_crdt.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/05_clc_crdt.mdx @@ -1,3 +1,8 @@ -## Handling column conflicts using CRDT data types +--- +navTitle: Using CRDT data types +title: Handling column-level conflicts using CRDT data types +redirects: + - /pgd/latest/bdr/column-level-conflicts/ +--- By default, column-level conflict resolution picks the value with a higher timestamp and discards the other one. You can, however, reconcile the conflict in different, more elaborate ways. For example, you can use CRDT types that allow merging the conflicting values without discarding any information. \ No newline at end of file From 44a4a786e5480f40efe9124e64bd757adfd44a04 Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Mon, 8 Jul 2024 15:04:11 -0400 Subject: [PATCH 04/31] fixed index links: --- .../column-level-conflicts/01_overview_clcd.mdx | 6 +++--- .../column-level-conflicts/02_enabling_disabling.mdx | 7 +++---- .../5/consistency/column-level-conflicts/index.mdx | 11 ++++++++++- 3 files changed, 16 insertions(+), 8 deletions(-) diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx index 0da69565f20..599d8c76e73 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx @@ -7,9 +7,9 @@ redirects: By default, conflicts are resolved at row level. When changes from two nodes conflict, either the local or remote tuple is selected and the other is discarded. For example, commit timestamps for the two conflicting changes might be compared and the newer one kept. This approach ensures that all nodes converge to the same result and establishes commit-order-like semantics on the whole cluster. -However, in some cases it might be appropriate to resolve conflicts at the column level rather than the row level. +However, it might sometimes be appropriate to resolve conflicts at the column level rather than the row level, at least in some cases. -Consider a simple example, in which table t has two integer columns a and b and a single row `(1,1)`. On one node execute: +Consider a simple example in which table t has two integer columns, a and b, and a single row `(1,1)`. On one node execute: ```sql UPDATE t SET a = 100 @@ -23,7 +23,7 @@ UPDATE t SET b = 100 This sequence results in an `UPDATE-UPDATE` conflict. With the `update_if_newer` conflict resolution, the commit timestamps are compared, and the new row version is kept. Assuming the second node committed last, the result is `(1,100)`, which effectively discards the change to column a. -For many use cases, this behavior is the desired and expected. However, for some use cases, this might be an issue. Consider, for example, a multi-node cluster where each part of the application is connected to a different node, updating a dedicated subset of columns in a shared table. In that case, the different components might conflict and overwrite changes. +For many use cases, this behavior is desired and expected. However, for some use cases, this might be an issue. Consider, for example, a multi-node cluster where each part of the application is connected to a different node, updating a dedicated subset of columns in a shared table. In that case, the different components might conflict and overwrite changes. For such use cases, it might be more appropriate to resolve conflicts on a given table at the column level. To achieve that, PGD tracks the timestamp of the last change for each column separately and uses that to pick the most recent value, essentially performing `update_if_newer`. diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx index c59a7839c2a..2908cbc1c53 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx @@ -1,12 +1,11 @@ --- navTitle: Enabling and disabling -title: Enabling and disabling +title: Enabling and disabling column-level conflict resolution +deepToC: true redirects: - /pgd/latest/bdr/column-level-conflicts/ --- -## Enabling and disabling column-level conflict resolution - !!! Note Permissions required Column-level conflict detection uses the `column_timestamps` type. This type requires any user needing to detect column-level conflicts to have at least the [bdr_application](../security/pgd-predefined-roles/#bdr_application) role assigned. !!! @@ -74,6 +73,6 @@ bdr.column_timestamps_create(p_source cstring, p_timestamp timestampstz) - `p_source` — The two options are `current` or `commit`. - `p_timestamp` — Timestamp depends on the source chosen. If `commit`, then `TIMESTAMP_SOURCE_COMMIT`. If `current`, then `TIMESTAMP_SOURCE_CURRENT`. -!!! Note +!!! Note DDL locking When enabling or disabling column timestamps on a table, the code uses DDL locking to ensure that there are no pending changes from before the switch. This approach ensures only conflicts with timestamps in both tuples or in neither of them are seen. Otherwise, the code might unexpectedly see timestamps in the local tuple and NULL in the remote one. It also ensures that the changes are resolved the same way (column-level or row-level) on all nodes. !!! \ No newline at end of file diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx index 5a45132b11c..03d5b770e97 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx @@ -3,4 +3,13 @@ navTitle: Column-level conflict resolution title: Column-level conflict detection redirects: - /pgd/latest/bdr/column-level-conflicts/ ---- \ No newline at end of file +--- + + +* [Overview](01_overview_clcd) introduces the notion of a column-level conflict in contrast to row-level conflicts. + +* [Enabling and disabling](02_enabling_disabling) + +* [Timestamps](04_timestamps) + +* [Handling column-level conflicts using CRDT data types](05_clc_crdt) \ No newline at end of file From dfedb14a06c71353fb3bdd2c6b96e1b3050a5b95 Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Mon, 8 Jul 2024 15:49:31 -0400 Subject: [PATCH 05/31] Added description to items in index.mdx --- .../docs/pgd/5/consistency/column-level-conflicts/index.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx index 03d5b770e97..8a26f50f8e4 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx @@ -8,8 +8,8 @@ redirects: * [Overview](01_overview_clcd) introduces the notion of a column-level conflict in contrast to row-level conflicts. -* [Enabling and disabling](02_enabling_disabling) +* [Enabling and disabling](02_enabling_disabling) provides an example of enabling column-level conflict resolution and introduces [`bdr.column_timestamps_create`](02_enabling_disabling/#bdrcolumn_timestamps_create). -* [Timestamps](04_timestamps) +* [Timestamps](04_timestamps) explicates how timestamps can be selected and inspected. -* [Handling column-level conflicts using CRDT data types](05_clc_crdt) \ No newline at end of file +* [Handling column-level conflicts using CRDT data types](05_clc_crdt) notes how column-level conflict resolution can reconcile using DRDT types that allow merging conflicts. From 4d0c215c09a1acb928ff8ef031c0b156cf35d831 Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Tue, 9 Jul 2024 13:51:26 -0400 Subject: [PATCH 06/31] Started integrating notes section. --- .../01_overview_clcd.mdx | 26 ++++++++++++++++++- .../02_enabling_disabling.mdx | 5 ++-- .../column-level-conflicts/04_timestamps.mdx | 21 ++++++++++++++- 3 files changed, 48 insertions(+), 4 deletions(-) diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx index 599d8c76e73..1c97a831486 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx @@ -31,4 +31,28 @@ Applied to the previous example, the result is `(100,100)` on both nodes, despit When thinking about column-level conflict resolution, it can be useful to see tables as vertically partitioned, so that each update affects data in only one slice. This approach eliminates conflicts between changes to different subsets of columns. In fact, vertical partitioning can even be a practical alternative to column-level conflict resolution. -Column-level conflict resolution requires the table to have `REPLICA IDENTITY FULL`. The `bdr.alter_table_conflict_detection` function checks that and fails with an error if this setting is missing. \ No newline at end of file +Column-level conflict resolution requires the table to have `REPLICA IDENTITY FULL`. The `bdr.alter_table_conflict_detection` function checks that and fails with an error if this setting is missing. + + + + +- By treating the columns independently, it's easy to violate constraints in a way that isn't possible when all changes happen on the same node. Consider, for example, a table like this: + + ```sql + CREATE TABLE t (id INT PRIMARY KEY, a INT, b INT, CHECK (a > b)); + INSERT INTO t VALUES (1, 1000, 1); + ``` + + Assume one node does: + + ```sql + UPDATE t SET a = 100; + ``` + + Another node concurrently does: + + ```sql + UPDATE t SET b = 500; + ``` + + Each of those updates is valid when executed on the initial row and so passes on each node. But when replicating to the other node, the resulting row violates the `CHECK (A > b)` constraint, and the replication stops until the issue is resolved manually. \ No newline at end of file diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx index 2908cbc1c53..752ba728ed2 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx @@ -10,7 +10,7 @@ redirects: Column-level conflict detection uses the `column_timestamps` type. This type requires any user needing to detect column-level conflicts to have at least the [bdr_application](../security/pgd-predefined-roles/#bdr_application) role assigned. !!! -The [bdr.alter_table_conflict_detection()](conflicts#bdralter_table_conflict_detection) function manages column-level conflict resolution. +The [bdr.alter_table_conflict_detection()](../../reference/conflict_functions/bdralter_table_conflict_detection) function manages column-level conflict resolution. ### Example @@ -75,4 +75,5 @@ bdr.column_timestamps_create(p_source cstring, p_timestamp timestampstz) !!! Note DDL locking When enabling or disabling column timestamps on a table, the code uses DDL locking to ensure that there are no pending changes from before the switch. This approach ensures only conflicts with timestamps in both tuples or in neither of them are seen. Otherwise, the code might unexpectedly see timestamps in the local tuple and NULL in the remote one. It also ensures that the changes are resolved the same way (column-level or row-level) on all nodes. -!!! \ No newline at end of file +!!! + diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx index 15c44b2106e..283434ac9bc 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx @@ -28,11 +28,20 @@ SELECT bdr.column_timestamps_enable('test_table'::regclass, 'cts', true); You can disable it using `bdr.column_timestamps_disable`. +!!! Note +When using regular timestamps to order changes or commits, the conflicting changes might have exactly the same timestamp because two or more nodes happened to generate the same timestamp. This risk isn't unique to column-level conflict resolution, as it can happen even for regular row-level conflict resolution. The node id is the tie breaker in this situation. The higher node id wins. This approach ensures that the same changes are applied on all nodes. +!!! + Commit timestamps currently have restrictions that are explained in [Notes](#notes). ## Inspecting column timestamps -The column storing timestamps for modified columns is maintained by triggers. Don't modify it directly. It can be useful to inspect the current timestamps value, for example, while investigating how a conflict was resolved. +The column storing timestamps for modified columns is maintained by triggers. Don't modify it directly. It can be useful to inspect the current timestamps value, for example, while investigating how a conflict was resolved. + +!!! Note +While the timestamp mapping is maintained by triggers, the order in which triggers execute matters. So if you have custom triggers that modify tuples and are executed after the `pgl_clcd_` triggers, the modified columns aren't detected correctly. +!!! + Three functions are useful for this purpose: @@ -85,4 +94,14 @@ db=# select bdr.column_timestamps_resolve(cts, xmin)::jsonb from test_table; ----------------------------------------------------------------------------------------------------------------------- {"map": {"2": "2018-09-23T19:29:55.581823+02:00"}, "source": "commit", "default": "2018-09-23T19:29:55.581823+02:00"} (1 row) +``` + +- A clock skew can occur between different nodes. It can induce somewhat unexpected behavior, discarding seemingly newer changes because the timestamps are inverted. However, you can manage clock skew between nodes using the parameters `bdr.maximum_clock_skew` and `bdr.maximum_clock_skew_action`. +- The column storing timestamp mapping is managed automatically. Don't specify or override the value in your queries, as the results can be unpredictable. (The value is ignored where possible.) +- For `INSERT` statements, there's no old row to compare the new one to, so all attributes are considered to be modified, and they are assigned a new timestamp. This condition applies even for columns that weren't included in the `INSERT` statement and received default values. PGD can detect the attributes that have a default value but can't know if it was included automatically or specified explicitly. +- The attributes modified by an `UPDATE` are determined by comparing the old and new row in a trigger. This means that if the attribute doesn't change a value, it isn't detected as modified even if it's explicitly set. For example, `UPDATE t SET a = a` doesn't mark `a` as modified for any row. Similarly, `UPDATE t SET a = 1` doesn't mark `a` as modified for rows that are already set to `1`. +- A clock skew can occur between different nodes. It can induce somewhat unexpected behavior, discarding seemingly newer changes because the timestamps are inverted. However, you can manage clock skew between nodes using the parameters `bdr.maximum_clock_skew` and `bdr.maximum_clock_skew_action`. + +```sql +SELECT bdr.alter_node_group_config('group', ignore_redundant_updates := false); ``` \ No newline at end of file From d0ce177fb05c31192ee5bd42da8b7838321097cb Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Tue, 9 Jul 2024 14:56:38 -0400 Subject: [PATCH 07/31] Continue integrating notes. --- .../column-level-conflicts/04_timestamps.mdx | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx index 283434ac9bc..30d562c3295 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx @@ -11,6 +11,14 @@ An important decision is the timestamp to assign to modified columns. By default, the timestamp assigned to modified columns is the current timestamp, as if obtained from `clock_timestamp`. This is simple, and for many cases it is correct (for example, when the conflicting rows modify non-overlapping subsets of columns). +!!! Note +A clock skew can occur between different nodes. It can induce somewhat unexpected behavior, discarding seemingly newer changes because the timestamps are inverted. However, you can manage clock skew between nodes using the parameters `bdr.maximum_clock_skew` and `bdr.maximum_clock_skew_action`. +!!! + +!!! Note +The column storing timestamp mapping is managed automatically. Don't specify or override the value in your queries, as the results can be unpredictable. (The value is ignored where possible.) +!!! + It can, however, have various unexpected effects: - The timestamp changes during statement execution. So, if an `UPDATE` affects multiple rows, each gets a slightly different timestamp. This means that the effects of concurrent changes might get "mixed" in various ways, depending on how the changes performed on different nodes interleave. @@ -97,10 +105,8 @@ db=# select bdr.column_timestamps_resolve(cts, xmin)::jsonb from test_table; ``` - A clock skew can occur between different nodes. It can induce somewhat unexpected behavior, discarding seemingly newer changes because the timestamps are inverted. However, you can manage clock skew between nodes using the parameters `bdr.maximum_clock_skew` and `bdr.maximum_clock_skew_action`. -- The column storing timestamp mapping is managed automatically. Don't specify or override the value in your queries, as the results can be unpredictable. (The value is ignored where possible.) - For `INSERT` statements, there's no old row to compare the new one to, so all attributes are considered to be modified, and they are assigned a new timestamp. This condition applies even for columns that weren't included in the `INSERT` statement and received default values. PGD can detect the attributes that have a default value but can't know if it was included automatically or specified explicitly. - The attributes modified by an `UPDATE` are determined by comparing the old and new row in a trigger. This means that if the attribute doesn't change a value, it isn't detected as modified even if it's explicitly set. For example, `UPDATE t SET a = a` doesn't mark `a` as modified for any row. Similarly, `UPDATE t SET a = 1` doesn't mark `a` as modified for rows that are already set to `1`. -- A clock skew can occur between different nodes. It can induce somewhat unexpected behavior, discarding seemingly newer changes because the timestamps are inverted. However, you can manage clock skew between nodes using the parameters `bdr.maximum_clock_skew` and `bdr.maximum_clock_skew_action`. ```sql SELECT bdr.alter_node_group_config('group', ignore_redundant_updates := false); From 5cd4fe20bb772abcafcea3239fe632ebefd4ad30 Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Wed, 10 Jul 2024 10:11:38 -0400 Subject: [PATCH 08/31] Small changes. --- .../column-level-conflicts/04_timestamps.mdx | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx index 30d562c3295..95f10450395 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx @@ -7,7 +7,11 @@ redirects: ## Current versus commit timestamp -An important decision is the timestamp to assign to modified columns. +An important decision is which timestamp to assign to modified columns. + +!!! Note +The column storing timestamp mapping is managed automatically. Don't specify or override the value in your queries, as the results can be unpredictable. (The value is ignored where possible.) +!!! By default, the timestamp assigned to modified columns is the current timestamp, as if obtained from `clock_timestamp`. This is simple, and for many cases it is correct (for example, when the conflicting rows modify non-overlapping subsets of columns). @@ -15,14 +19,11 @@ By default, the timestamp assigned to modified columns is the current timestamp, A clock skew can occur between different nodes. It can induce somewhat unexpected behavior, discarding seemingly newer changes because the timestamps are inverted. However, you can manage clock skew between nodes using the parameters `bdr.maximum_clock_skew` and `bdr.maximum_clock_skew_action`. !!! -!!! Note -The column storing timestamp mapping is managed automatically. Don't specify or override the value in your queries, as the results can be unpredictable. (The value is ignored where possible.) -!!! - It can, however, have various unexpected effects: - The timestamp changes during statement execution. So, if an `UPDATE` affects multiple rows, each gets a slightly different timestamp. This means that the effects of concurrent changes might get "mixed" in various ways, depending on how the changes performed on different nodes interleave. + - The timestamp is unrelated to the commit timestamp. Using it to resolve conflicts means that the result isn't equivalent to the commit order, which means it likely can't be serialized. !!! Note @@ -40,17 +41,14 @@ You can disable it using `bdr.column_timestamps_disable`. When using regular timestamps to order changes or commits, the conflicting changes might have exactly the same timestamp because two or more nodes happened to generate the same timestamp. This risk isn't unique to column-level conflict resolution, as it can happen even for regular row-level conflict resolution. The node id is the tie breaker in this situation. The higher node id wins. This approach ensures that the same changes are applied on all nodes. !!! -Commit timestamps currently have restrictions that are explained in [Notes](#notes). - ## Inspecting column timestamps The column storing timestamps for modified columns is maintained by triggers. Don't modify it directly. It can be useful to inspect the current timestamps value, for example, while investigating how a conflict was resolved. !!! Note -While the timestamp mapping is maintained by triggers, the order in which triggers execute matters. So if you have custom triggers that modify tuples and are executed after the `pgl_clcd_` triggers, the modified columns aren't detected correctly. +The timestamp mapping is maintained by triggers and the order in which triggers execute matters. So if you have custom triggers that modify tuples and are executed after the `pgl_clcd_` triggers, the modified columns aren't detected correctly. !!! - Three functions are useful for this purpose: - `bdr.column_timestamps_to_text(bdr.column_timestamps)` @@ -104,7 +102,6 @@ db=# select bdr.column_timestamps_resolve(cts, xmin)::jsonb from test_table; (1 row) ``` -- A clock skew can occur between different nodes. It can induce somewhat unexpected behavior, discarding seemingly newer changes because the timestamps are inverted. However, you can manage clock skew between nodes using the parameters `bdr.maximum_clock_skew` and `bdr.maximum_clock_skew_action`. - For `INSERT` statements, there's no old row to compare the new one to, so all attributes are considered to be modified, and they are assigned a new timestamp. This condition applies even for columns that weren't included in the `INSERT` statement and received default values. PGD can detect the attributes that have a default value but can't know if it was included automatically or specified explicitly. - The attributes modified by an `UPDATE` are determined by comparing the old and new row in a trigger. This means that if the attribute doesn't change a value, it isn't detected as modified even if it's explicitly set. For example, `UPDATE t SET a = a` doesn't mark `a` as modified for any row. Similarly, `UPDATE t SET a = 1` doesn't mark `a` as modified for rows that are already set to `1`. From 02de691d8dc8d508f21eca58c257285928e2a266 Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Wed, 10 Jul 2024 14:33:26 -0400 Subject: [PATCH 09/31] Removed some notes and integrated some. --- .../pgd/5/consistency/column-level-conflicts/04_timestamps.mdx | 3 --- .../docs/pgd/5/consistency/column-level-conflicts/index.mdx | 3 +++ 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx index 95f10450395..ffa6ca35f7f 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx @@ -105,6 +105,3 @@ db=# select bdr.column_timestamps_resolve(cts, xmin)::jsonb from test_table; - For `INSERT` statements, there's no old row to compare the new one to, so all attributes are considered to be modified, and they are assigned a new timestamp. This condition applies even for columns that weren't included in the `INSERT` statement and received default values. PGD can detect the attributes that have a default value but can't know if it was included automatically or specified explicitly. - The attributes modified by an `UPDATE` are determined by comparing the old and new row in a trigger. This means that if the attribute doesn't change a value, it isn't detected as modified even if it's explicitly set. For example, `UPDATE t SET a = a` doesn't mark `a` as modified for any row. Similarly, `UPDATE t SET a = 1` doesn't mark `a` as modified for rows that are already set to `1`. -```sql -SELECT bdr.alter_node_group_config('group', ignore_redundant_updates := false); -``` \ No newline at end of file diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx index 8a26f50f8e4..eaad26db1d4 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx @@ -5,6 +5,9 @@ redirects: - /pgd/latest/bdr/column-level-conflicts/ --- +By default, conflicts are resolved at row level. When changes from two nodes conflict, either the local or remote tuple is selected and the other is discarded. For example, commit timestamps for the two conflicting changes might be compared and the newer one kept. This approach ensures that all nodes converge to the same result and establishes commit-order-like semantics on the whole cluster. + +However, it might sometimes be appropriate to resolve conflicts at the column level rather than the row level, at least in some cases. * [Overview](01_overview_clcd) introduces the notion of a column-level conflict in contrast to row-level conflicts. From addf1404cce629be850dcad7d7349f2d4b0c8492 Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Wed, 10 Jul 2024 14:52:37 -0400 Subject: [PATCH 10/31] Cleaned up note integration. --- .../column-level-conflicts/01_overview_clcd.mdx | 11 ++++++----- .../column-level-conflicts/04_timestamps.mdx | 12 ++++-------- 2 files changed, 10 insertions(+), 13 deletions(-) diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx index 1c97a831486..99e844bee91 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx @@ -21,6 +21,10 @@ On another node, before receiving the preceding `UPDATE`, concurrently execute: UPDATE t SET b = 100 ``` +!!! Note +The attributes modified by an `UPDATE` are determined by comparing the old and new row in a trigger. This means that if the attribute doesn't change a value, it isn't detected as modified even if it's explicitly set. For example, `UPDATE t SET a = a` doesn't mark `a` as modified for any row. Similarly, `UPDATE t SET a = 1` doesn't mark `a` as modified for rows that are already set to `1`. +!!! + This sequence results in an `UPDATE-UPDATE` conflict. With the `update_if_newer` conflict resolution, the commit timestamps are compared, and the new row version is kept. Assuming the second node committed last, the result is `(1,100)`, which effectively discards the change to column a. For many use cases, this behavior is desired and expected. However, for some use cases, this might be an issue. Consider, for example, a multi-node cluster where each part of the application is connected to a different node, updating a dedicated subset of columns in a shared table. In that case, the different components might conflict and overwrite changes. @@ -33,10 +37,7 @@ When thinking about column-level conflict resolution, it can be useful to see ta Column-level conflict resolution requires the table to have `REPLICA IDENTITY FULL`. The `bdr.alter_table_conflict_detection` function checks that and fails with an error if this setting is missing. - - - -- By treating the columns independently, it's easy to violate constraints in a way that isn't possible when all changes happen on the same node. Consider, for example, a table like this: +By treating the columns independently, it's easy to violate constraints in a way that isn't possible when all changes happen on the same node. Consider, for example, a table like this: ```sql CREATE TABLE t (id INT PRIMARY KEY, a INT, b INT, CHECK (a > b)); @@ -55,4 +56,4 @@ Column-level conflict resolution requires the table to have `REPLICA IDENTITY FU UPDATE t SET b = 500; ``` - Each of those updates is valid when executed on the initial row and so passes on each node. But when replicating to the other node, the resulting row violates the `CHECK (A > b)` constraint, and the replication stops until the issue is resolved manually. \ No newline at end of file +Each of those updates is valid when executed on the initial row and so passes on each node. But when replicating to the other node, the resulting row violates the `CHECK (A > b)` constraint, and the replication stops until the issue is resolved manually. diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx index ffa6ca35f7f..ef2d4f016ee 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx @@ -15,19 +15,19 @@ The column storing timestamp mapping is managed automatically. Don't specify or By default, the timestamp assigned to modified columns is the current timestamp, as if obtained from `clock_timestamp`. This is simple, and for many cases it is correct (for example, when the conflicting rows modify non-overlapping subsets of columns). -!!! Note -A clock skew can occur between different nodes. It can induce somewhat unexpected behavior, discarding seemingly newer changes because the timestamps are inverted. However, you can manage clock skew between nodes using the parameters `bdr.maximum_clock_skew` and `bdr.maximum_clock_skew_action`. -!!! - It can, however, have various unexpected effects: - The timestamp changes during statement execution. So, if an `UPDATE` affects multiple rows, each gets a slightly different timestamp. This means that the effects of concurrent changes might get "mixed" in various ways, depending on how the changes performed on different nodes interleave. +!!! Note +A clock skew can occur between different nodes. It can induce somewhat unexpected behavior, discarding seemingly newer changes because the timestamps are inverted. However, you can manage clock skew between nodes using the parameters `bdr.maximum_clock_skew` and `bdr.maximum_clock_skew_action`. +!!! - The timestamp is unrelated to the commit timestamp. Using it to resolve conflicts means that the result isn't equivalent to the commit order, which means it likely can't be serialized. !!! Note Statement and transaction timestamps might be added in the future, which will address issues with mixing effects of concurrent statements or transactions. Still, neither of these options can ever produce results equivalent to commit order. +!!! You can also use the actual commit timestamp, although this feature is considered experimental. To use the commit timestamp, set the last parameter to `true` when enabling column-level conflict resolution: @@ -101,7 +101,3 @@ db=# select bdr.column_timestamps_resolve(cts, xmin)::jsonb from test_table; {"map": {"2": "2018-09-23T19:29:55.581823+02:00"}, "source": "commit", "default": "2018-09-23T19:29:55.581823+02:00"} (1 row) ``` - -- For `INSERT` statements, there's no old row to compare the new one to, so all attributes are considered to be modified, and they are assigned a new timestamp. This condition applies even for columns that weren't included in the `INSERT` statement and received default values. PGD can detect the attributes that have a default value but can't know if it was included automatically or specified explicitly. -- The attributes modified by an `UPDATE` are determined by comparing the old and new row in a trigger. This means that if the attribute doesn't change a value, it isn't detected as modified even if it's explicitly set. For example, `UPDATE t SET a = a` doesn't mark `a` as modified for any row. Similarly, `UPDATE t SET a = 1` doesn't mark `a` as modified for rows that are already set to `1`. - From 29e6142bf883843aa1a9eed226cec96ad3fe381a Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Wed, 10 Jul 2024 15:01:44 -0400 Subject: [PATCH 11/31] Moved column_timestamps_create to reference. --- .../02_enabling_disabling.mdx | 15 --------------- .../column-level-conflicts/04_timestamps.mdx | 2 +- product_docs/docs/pgd/5/reference/clcd.mdx | 19 +++++++++++++++++++ product_docs/docs/pgd/5/reference/index.json | 11 ++++++++--- product_docs/docs/pgd/5/reference/index.mdx | 16 ++++++++++++---- 5 files changed, 40 insertions(+), 23 deletions(-) create mode 100644 product_docs/docs/pgd/5/reference/clcd.mdx diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx index 752ba728ed2..581b80c5eaf 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx @@ -58,21 +58,6 @@ WHERE NOT pg_is_other_temp_schema(nc.oid) AND c.relkind IN ('r', 'v', 'f', 'p'); ``` -### bdr.column_timestamps_create - -This function creates column-level conflict resolution. It's called within `column_timestamp_enable`. - -#### Synopsis - -```sql -bdr.column_timestamps_create(p_source cstring, p_timestamp timestampstz) -``` - -#### Parameters - -- `p_source` — The two options are `current` or `commit`. -- `p_timestamp` — Timestamp depends on the source chosen. If `commit`, then `TIMESTAMP_SOURCE_COMMIT`. If `current`, then `TIMESTAMP_SOURCE_CURRENT`. - !!! Note DDL locking When enabling or disabling column timestamps on a table, the code uses DDL locking to ensure that there are no pending changes from before the switch. This approach ensures only conflicts with timestamps in both tuples or in neither of them are seen. Otherwise, the code might unexpectedly see timestamps in the local tuple and NULL in the remote one. It also ensures that the changes are resolved the same way (column-level or row-level) on all nodes. !!! diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx index ef2d4f016ee..129d1765ca0 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx @@ -26,7 +26,7 @@ A clock skew can occur between different nodes. It can induce somewhat unexpecte - The timestamp is unrelated to the commit timestamp. Using it to resolve conflicts means that the result isn't equivalent to the commit order, which means it likely can't be serialized. !!! Note - Statement and transaction timestamps might be added in the future, which will address issues with mixing effects of concurrent statements or transactions. Still, neither of these options can ever produce results equivalent to commit order. +Statement and transaction timestamps might be added in the future, which will address issues with mixing effects of concurrent statements or transactions. Still, neither of these options can ever produce results equivalent to commit order. !!! You can also use the actual commit timestamp, although this feature is considered experimental. To use the commit timestamp, set the last parameter to `true` when enabling column-level conflict resolution: diff --git a/product_docs/docs/pgd/5/reference/clcd.mdx b/product_docs/docs/pgd/5/reference/clcd.mdx new file mode 100644 index 00000000000..0cb9c7cca2c --- /dev/null +++ b/product_docs/docs/pgd/5/reference/clcd.mdx @@ -0,0 +1,19 @@ +--- +title: Column-level conflict detection +indexdepth: 2 +--- + +## bdr.column_timestamps_create + +This function creates column-level conflict resolution. It's called within `column_timestamp_enable`. + +### Synopsis + +```sql +bdr.column_timestamps_create(p_source cstring, p_timestamp timestampstz) +``` + +### Parameters + +- `p_source` — The two options are `current` or `commit`. +- `p_timestamp` — Timestamp depends on the source chosen. If `commit`, then `TIMESTAMP_SOURCE_COMMIT`. If `current`, then `TIMESTAMP_SOURCE_CURRENT`. \ No newline at end of file diff --git a/product_docs/docs/pgd/5/reference/index.json b/product_docs/docs/pgd/5/reference/index.json index e1f4640e238..b5a6111189f 100644 --- a/product_docs/docs/pgd/5/reference/index.json +++ b/product_docs/docs/pgd/5/reference/index.json @@ -319,7 +319,12 @@ "bdrtaskmgr_set_leader": "/pgd/latest/reference/functions-internal#bdrtaskmgr_set_leader", "bdrtaskmgr_get_last_completed_workitem": "/pgd/latest/reference/functions-internal#bdrtaskmgr_get_last_completed_workitem", "bdrtaskmgr_work_queue_check_status": "/pgd/latest/reference/functions-internal#bdrtaskmgr_work_queue_check_status", - "bdralter_table_conflict_detection": "/pgd/latest/reference/conflict_functions#bdralter_table_conflict_detection", - "bdralter_node_set_conflict_resolver": "/pgd/latest/reference/conflict_functions#bdralter_node_set_conflict_resolver", - "bdralter_node_set_log_config": "/pgd/latest/reference/conflict_functions#bdralter_node_set_log_config" + "bdrcolumn_timestamps_create": "/pgd/latest/reference/clcd#bdrcolumn_timestamps_create", + "conflict-detection": "/pgd/latest/reference/conflicts#conflict-detection", + "list-of-conflict-types": "/pgd/latest/reference/conflicts#list-of-conflict-types", + "conflict-resolution": "/pgd/latest/reference/conflicts#conflict-resolution", + "list-of-conflict-resolvers": "/pgd/latest/reference/conflicts#list-of-conflict-resolvers", + "default-conflict-resolvers": "/pgd/latest/reference/conflicts#default-conflict-resolvers", + "list-of-conflict-resolutions": "/pgd/latest/reference/conflicts#list-of-conflict-resolutions", + "conflict-logging": "/pgd/latest/reference/conflicts#conflict-logging" } \ No newline at end of file diff --git a/product_docs/docs/pgd/5/reference/index.mdx b/product_docs/docs/pgd/5/reference/index.mdx index 6ce731e3436..be0361b0926 100644 --- a/product_docs/docs/pgd/5/reference/index.mdx +++ b/product_docs/docs/pgd/5/reference/index.mdx @@ -436,7 +436,15 @@ The reference section is a definitive listing of all functions, views, and comma * [`bdr.taskmgr_work_queue_check_status`](functions-internal#bdrtaskmgr_work_queue_check_status) -## [Conflict functions](conflict_functions) - * [`bdr.alter_table_conflict_detection`](conflict_functions#bdralter_table_conflict_detection) - * [`bdr.alter_node_set_conflict_resolver`](conflict_functions#bdralter_node_set_conflict_resolver) - * [`bdr.alter_node_set_log_config`](conflict_functions#bdralter_node_set_log_config) +## [Column-level conflict detection](clcd) + * [bdr.column_timestamps_create](clcd#bdrcolumn_timestamps_create) + + +## [Conflicts](conflicts) + * [Conflict detection](conflicts#conflict-detection) + * [List of conflict types](conflicts#list-of-conflict-types) + * [Conflict resolution](conflicts#conflict-resolution) + * [List of conflict resolvers](conflicts#list-of-conflict-resolvers) + * [Default conflict resolvers](conflicts#default-conflict-resolvers) + * [List of conflict resolutions](conflicts#list-of-conflict-resolutions) + * [Conflict logging](conflicts#conflict-logging) From bc788c36c9df9a65ddc55b37356b8f1256adc4eb Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Wed, 10 Jul 2024 16:13:37 -0400 Subject: [PATCH 12/31] Added pages to index.mdx.src --- product_docs/docs/pgd/5/reference/clcd.mdx | 4 ++-- product_docs/docs/pgd/5/reference/index.json | 3 +++ product_docs/docs/pgd/5/reference/index.mdx | 14 ++++++++++++-- product_docs/docs/pgd/5/reference/index.mdx.src | 4 ++++ 4 files changed, 21 insertions(+), 4 deletions(-) diff --git a/product_docs/docs/pgd/5/reference/clcd.mdx b/product_docs/docs/pgd/5/reference/clcd.mdx index 0cb9c7cca2c..44859ecfdfc 100644 --- a/product_docs/docs/pgd/5/reference/clcd.mdx +++ b/product_docs/docs/pgd/5/reference/clcd.mdx @@ -1,9 +1,9 @@ --- -title: Column-level conflict detection +title: Column-level conflict functions indexdepth: 2 --- -## bdr.column_timestamps_create +## `bdr.column_timestamps_create` This function creates column-level conflict resolution. It's called within `column_timestamp_enable`. diff --git a/product_docs/docs/pgd/5/reference/index.json b/product_docs/docs/pgd/5/reference/index.json index b5a6111189f..6ff2dd2940f 100644 --- a/product_docs/docs/pgd/5/reference/index.json +++ b/product_docs/docs/pgd/5/reference/index.json @@ -319,6 +319,9 @@ "bdrtaskmgr_set_leader": "/pgd/latest/reference/functions-internal#bdrtaskmgr_set_leader", "bdrtaskmgr_get_last_completed_workitem": "/pgd/latest/reference/functions-internal#bdrtaskmgr_get_last_completed_workitem", "bdrtaskmgr_work_queue_check_status": "/pgd/latest/reference/functions-internal#bdrtaskmgr_work_queue_check_status", + "bdralter_table_conflict_detection": "/pgd/latest/reference/conflict_functions#bdralter_table_conflict_detection", + "bdralter_node_set_conflict_resolver": "/pgd/latest/reference/conflict_functions#bdralter_node_set_conflict_resolver", + "bdralter_node_set_log_config": "/pgd/latest/reference/conflict_functions#bdralter_node_set_log_config", "bdrcolumn_timestamps_create": "/pgd/latest/reference/clcd#bdrcolumn_timestamps_create", "conflict-detection": "/pgd/latest/reference/conflicts#conflict-detection", "list-of-conflict-types": "/pgd/latest/reference/conflicts#list-of-conflict-types", diff --git a/product_docs/docs/pgd/5/reference/index.mdx b/product_docs/docs/pgd/5/reference/index.mdx index be0361b0926..082ad28f414 100644 --- a/product_docs/docs/pgd/5/reference/index.mdx +++ b/product_docs/docs/pgd/5/reference/index.mdx @@ -1,4 +1,5 @@ --- +# edit index.mdx.src DO NOT edit index.mdx or index.json title: "PGD reference" navTitle: "PGD reference" description: > @@ -21,6 +22,9 @@ navigation: - streamtriggers - catalogs-internal - functions-internal +- conflict_functions +- clcd +- conflicts --- The reference section is a definitive listing of all functions, views, and commands available in EDB Postgres Distributed. @@ -436,8 +440,14 @@ The reference section is a definitive listing of all functions, views, and comma * [`bdr.taskmgr_work_queue_check_status`](functions-internal#bdrtaskmgr_work_queue_check_status) -## [Column-level conflict detection](clcd) - * [bdr.column_timestamps_create](clcd#bdrcolumn_timestamps_create) +## [Conflict functions](conflict_functions) + * [`bdr.alter_table_conflict_detection`](conflict_functions#bdralter_table_conflict_detection) + * [`bdr.alter_node_set_conflict_resolver`](conflict_functions#bdralter_node_set_conflict_resolver) + * [`bdr.alter_node_set_log_config`](conflict_functions#bdralter_node_set_log_config) + + +## [Column-level conflict functions](clcd) + * [`bdr.column_timestamps_create`](clcd#bdrcolumn_timestamps_create) ## [Conflicts](conflicts) diff --git a/product_docs/docs/pgd/5/reference/index.mdx.src b/product_docs/docs/pgd/5/reference/index.mdx.src index c516ecb5a1c..10417a08c95 100644 --- a/product_docs/docs/pgd/5/reference/index.mdx.src +++ b/product_docs/docs/pgd/5/reference/index.mdx.src @@ -1,4 +1,5 @@ --- +# edit index.mdx.src DO NOT edit index.mdx or index.json title: "PGD reference" navTitle: "PGD reference" description: > @@ -21,6 +22,9 @@ navigation: - streamtriggers - catalogs-internal - functions-internal +- conflict_functions +- clcd +- conflicts --- The reference section is a definitive listing of all functions, views, and commands available in EDB Postgres Distributed. From 99374b6198b2cbc99d23e1f52855292b508eb3f1 Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Fri, 12 Jul 2024 14:32:26 -0400 Subject: [PATCH 13/31] Update product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx Co-authored-by: Dj Walker-Morgan <126472455+djw-m@users.noreply.github.com> --- .../column-level-conflicts/02_enabling_disabling.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx index 581b80c5eaf..f73f886576c 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx @@ -40,7 +40,7 @@ The new column specifies `NOT NULL` with a default value, which means that `ALTE !!! Note Avoid using columns with the `bdr.column_timestamps` data type for other purposes, as doing so can have negative effects. For example, it switches the table to column-level conflict resolution, which doesn't work correctly without the triggers. -### Listing table with column-level conflict resolution +### Listing tables with column-level conflict resolution You can list tables having column-level conflict resolution enabled with the following query. This query detects the presence of a column of type `bdr.column_timestamp`. From 65716cc425eafd8a7877a0f5e681e53fa4329990 Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Fri, 12 Jul 2024 14:32:53 -0400 Subject: [PATCH 14/31] Fixed links. --- .../column-level-conflicts/02_enabling_disabling.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx index f73f886576c..fd92e47eca3 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx @@ -7,10 +7,10 @@ redirects: --- !!! Note Permissions required -Column-level conflict detection uses the `column_timestamps` type. This type requires any user needing to detect column-level conflicts to have at least the [bdr_application](../security/pgd-predefined-roles/#bdr_application) role assigned. +Column-level conflict detection uses the `column_timestamps` type. This type requires any user needing to detect column-level conflicts to have at least the [bdr_application](../../security/pgd-predefined-roles/#bdr_application) role assigned. !!! -The [bdr.alter_table_conflict_detection()](../../reference/conflict_functions/bdralter_table_conflict_detection) function manages column-level conflict resolution. +The [bdr.alter_table_conflict_detection()](../../reference/conflict_functions/#bdralter_table_conflict_detection) function manages column-level conflict resolution. ### Example From 4d155ea4e5a71002e10473301af50f11cd22c5d2 Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Fri, 12 Jul 2024 14:53:41 -0400 Subject: [PATCH 15/31] Moved crdt bit to overview and added link. --- .../column-level-conflicts/01_overview_clcd.mdx | 4 ++++ .../5/consistency/column-level-conflicts/05_clc_crdt.mdx | 8 -------- 2 files changed, 4 insertions(+), 8 deletions(-) delete mode 100644 product_docs/docs/pgd/5/consistency/column-level-conflicts/05_clc_crdt.mdx diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx index 99e844bee91..07fce792663 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx @@ -57,3 +57,7 @@ By treating the columns independently, it's easy to violate constraints in a way ``` Each of those updates is valid when executed on the initial row and so passes on each node. But when replicating to the other node, the resulting row violates the `CHECK (A > b)` constraint, and the replication stops until the issue is resolved manually. + +## Handling column-level conflicts using CRDT data types + +By default, column-level conflict resolution picks the value with a higher timestamp and discards the other one. You can, however, reconcile the conflict in different, more elaborate ways. For example, you can use [CRDT types](../crdt) that allow merging the conflicting values without discarding any information. \ No newline at end of file diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/05_clc_crdt.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/05_clc_crdt.mdx deleted file mode 100644 index fc89fa290e7..00000000000 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/05_clc_crdt.mdx +++ /dev/null @@ -1,8 +0,0 @@ ---- -navTitle: Using CRDT data types -title: Handling column-level conflicts using CRDT data types -redirects: - - /pgd/latest/bdr/column-level-conflicts/ ---- - -By default, column-level conflict resolution picks the value with a higher timestamp and discards the other one. You can, however, reconcile the conflict in different, more elaborate ways. For example, you can use CRDT types that allow merging the conflicting values without discarding any information. \ No newline at end of file From 916c2b1d706e7f6702c5214c037f6c0d2e43f0d7 Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Tue, 30 Jul 2024 15:14:39 -0400 Subject: [PATCH 16/31] Small changes. --- .../5/consistency/column-level-conflicts/01_overview_clcd.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx index 07fce792663..454246ca78b 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx @@ -35,7 +35,7 @@ Applied to the previous example, the result is `(100,100)` on both nodes, despit When thinking about column-level conflict resolution, it can be useful to see tables as vertically partitioned, so that each update affects data in only one slice. This approach eliminates conflicts between changes to different subsets of columns. In fact, vertical partitioning can even be a practical alternative to column-level conflict resolution. -Column-level conflict resolution requires the table to have `REPLICA IDENTITY FULL`. The `bdr.alter_table_conflict_detection` function checks that and fails with an error if this setting is missing. +Column-level conflict resolution requires the table to have `REPLICA IDENTITY FULL`. The [bdr.alter_table_conflict_detection()](conflict_functions#bdralter_table_conflict_detection) function checks that and fails with an error if this setting is missing. By treating the columns independently, it's easy to violate constraints in a way that isn't possible when all changes happen on the same node. Consider, for example, a table like this: From 73b39f4ec9ef494a81e14fa98284b5720974a972 Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Wed, 31 Jul 2024 14:18:15 -0400 Subject: [PATCH 17/31] more small changes. --- .../column-level-conflicts/01_overview_clcd.mdx | 2 +- .../column-level-conflicts/02_enabling_disabling.mdx | 11 +++++++---- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx index 454246ca78b..277c8177fbf 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx @@ -35,7 +35,7 @@ Applied to the previous example, the result is `(100,100)` on both nodes, despit When thinking about column-level conflict resolution, it can be useful to see tables as vertically partitioned, so that each update affects data in only one slice. This approach eliminates conflicts between changes to different subsets of columns. In fact, vertical partitioning can even be a practical alternative to column-level conflict resolution. -Column-level conflict resolution requires the table to have `REPLICA IDENTITY FULL`. The [bdr.alter_table_conflict_detection()](conflict_functions#bdralter_table_conflict_detection) function checks that and fails with an error if this setting is missing. +Column-level conflict resolution requires the table to have `REPLICA IDENTITY FULL`. The [bdr.alter_table_conflict_detection()](../reference/conflict_functions#bdralter_table_conflict_detection) function checks that and fails with an error if this setting is missing. By treating the columns independently, it's easy to violate constraints in a way that isn't possible when all changes happen on the same node. Consider, for example, a table like this: diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx index fd92e47eca3..80af1c31bf1 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx @@ -16,6 +16,10 @@ The [bdr.alter_table_conflict_detection()](../../reference/conflict_functions/#b This example creates a table `test_table` and then enables column-level conflict resolution on it: +!!! Note DDL locking +When enabling or disabling column timestamps on a table, the code uses DDL locking to ensure that there are no pending changes from before the switch. This approach ensures only conflicts with timestamps in both tuples or in neither of them are seen. Otherwise, the code might unexpectedly see timestamps in the local tuple and NULL in the remote one. It also ensures that the changes are resolved the same way (column-level or row-level) on all nodes. +!!! + ```sql db=# CREATE TABLE my_app.test_table (id SERIAL PRIMARY KEY, val INT); CREATE TABLE @@ -38,7 +42,8 @@ The function adds a `cts` column as specified in the function call. It also crea The new column specifies `NOT NULL` with a default value, which means that `ALTER TABLE ... ADD COLUMN` doesn't perform a table rewrite. !!! Note - Avoid using columns with the `bdr.column_timestamps` data type for other purposes, as doing so can have negative effects. For example, it switches the table to column-level conflict resolution, which doesn't work correctly without the triggers. +Avoid using columns with the `bdr.column_timestamps` data type for other purposes, as doing so can have negative effects. For example, it switches the table to column-level conflict resolution, which doesn't work correctly without the triggers. +!!! ### Listing tables with column-level conflict resolution @@ -58,7 +63,5 @@ WHERE NOT pg_is_other_temp_schema(nc.oid) AND c.relkind IN ('r', 'v', 'f', 'p'); ``` -!!! Note DDL locking -When enabling or disabling column timestamps on a table, the code uses DDL locking to ensure that there are no pending changes from before the switch. This approach ensures only conflicts with timestamps in both tuples or in neither of them are seen. Otherwise, the code might unexpectedly see timestamps in the local tuple and NULL in the remote one. It also ensures that the changes are resolved the same way (column-level or row-level) on all nodes. -!!! + From 74f375744e1741c156fcb7ab4bccf27ea74618c1 Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Wed, 31 Jul 2024 14:33:37 -0400 Subject: [PATCH 18/31] Tweaks in light of PR 5813 complete. --- .../02_enabling_disabling.mdx | 13 +++++++++++++ .../column-level-conflicts/04_timestamps.mdx | 12 +++--------- 2 files changed, 16 insertions(+), 9 deletions(-) diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx index 80af1c31bf1..0f8cd24dc7f 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx @@ -12,6 +12,8 @@ Column-level conflict detection uses the `column_timestamps` type. This type req The [bdr.alter_table_conflict_detection()](../../reference/conflict_functions/#bdralter_table_conflict_detection) function manages column-level conflict resolution. + + ### Example This example creates a table `test_table` and then enables column-level conflict resolution on it: @@ -35,6 +37,17 @@ db(# 'column_modify_timestamp', 'cts'); t db=# \d my_app.test_table + Table "my_app.test_table" + Column | Type | Collation | Nullable | Default +--------+-----------------------+-----------+----------+-------------------------------------------------- + id | integer | | not null | nextval('my_app.test_table_id_seq'::regclass) + val | integer | | | + cts | bdr.column_timestamps | | not null | 's 1 775297963454602 0 0'::bdr.column_timestamps +Indexes: + "test_table_pkey" PRIMARY KEY, btree (id) +Triggers: + bdr_clcd_before_insert BEFORE INSERT ON my_app.test_table FOR EACH ROW EXECUTE FUNCTION bdr.column_timestamps_current_insert() + bdr_clcd_before_update BEFORE UPDATE ON my_app.test_table FOR EACH ROW EXECUTE FUNCTION bdr.column_timestamps_current_update() ``` The function adds a `cts` column as specified in the function call. It also creates two triggers (`BEFORE INSERT` and `BEFORE UPDATE`) that are responsible for maintaining timestamps in the new column before each change. diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx index 129d1765ca0..f9c57cb1fd0 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx @@ -13,7 +13,7 @@ An important decision is which timestamp to assign to modified columns. The column storing timestamp mapping is managed automatically. Don't specify or override the value in your queries, as the results can be unpredictable. (The value is ignored where possible.) !!! -By default, the timestamp assigned to modified columns is the current timestamp, as if obtained from `clock_timestamp`. This is simple, and for many cases it is correct (for example, when the conflicting rows modify non-overlapping subsets of columns). +If `column_modify_timestamp` is selected as the conflict detection method, the timestamp assigned to modified columns is the current timestamp, as if obtained from `clock_timestamp`. This is simple, and for many cases it is correct (for example, when the conflicting rows modify non-overlapping subsets of columns). It can, however, have various unexpected effects: @@ -25,18 +25,12 @@ A clock skew can occur between different nodes. It can induce somewhat unexpecte - The timestamp is unrelated to the commit timestamp. Using it to resolve conflicts means that the result isn't equivalent to the commit order, which means it likely can't be serialized. +You can also use the actual commit timestamp, specified with `column_commit_timestamp` as the conflict detection method. + !!! Note Statement and transaction timestamps might be added in the future, which will address issues with mixing effects of concurrent statements or transactions. Still, neither of these options can ever produce results equivalent to commit order. !!! -You can also use the actual commit timestamp, although this feature is considered experimental. To use the commit timestamp, set the last parameter to `true` when enabling column-level conflict resolution: - -```sql -SELECT bdr.column_timestamps_enable('test_table'::regclass, 'cts', true); -``` - -You can disable it using `bdr.column_timestamps_disable`. - !!! Note When using regular timestamps to order changes or commits, the conflicting changes might have exactly the same timestamp because two or more nodes happened to generate the same timestamp. This risk isn't unique to column-level conflict resolution, as it can happen even for regular row-level conflict resolution. The node id is the tie breaker in this situation. The higher node id wins. This approach ensures that the same changes are applied on all nodes. !!! From 1fbe2b18b03980e41a2039924fa8e47a89fd3e41 Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Thu, 1 Aug 2024 10:24:36 -0400 Subject: [PATCH 19/31] Small changes. --- .../column-level-conflicts/02_enabling_disabling.mdx | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx index 0f8cd24dc7f..4f27cbce7f9 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx @@ -12,16 +12,16 @@ Column-level conflict detection uses the `column_timestamps` type. This type req The [bdr.alter_table_conflict_detection()](../../reference/conflict_functions/#bdralter_table_conflict_detection) function manages column-level conflict resolution. +### DDL locking and managing timestamps +When enabling or disabling column timestamps on a table, the code uses DDL locking to ensure that there are no pending changes from before the switch. + +This approach ensures only conflicts with timestamps in both tuples or in neither of them are seen. Otherwise, the code might unexpectedly see timestamps in the local tuple and NULL in the remote one. It also ensures that the changes are resolved the same way (column-level or row-level) on all nodes. ### Example This example creates a table `test_table` and then enables column-level conflict resolution on it: -!!! Note DDL locking -When enabling or disabling column timestamps on a table, the code uses DDL locking to ensure that there are no pending changes from before the switch. This approach ensures only conflicts with timestamps in both tuples or in neither of them are seen. Otherwise, the code might unexpectedly see timestamps in the local tuple and NULL in the remote one. It also ensures that the changes are resolved the same way (column-level or row-level) on all nodes. -!!! - ```sql db=# CREATE TABLE my_app.test_table (id SERIAL PRIMARY KEY, val INT); CREATE TABLE From 7f4371291604957c8292278cf4223dd3ba3a603b Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Thu, 1 Aug 2024 11:33:27 -0400 Subject: [PATCH 20/31] rewrote timestamps --- .../column-level-conflicts/03_timestamps.mdx | 109 ++++++++++++++++++ 1 file changed, 109 insertions(+) create mode 100644 product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx new file mode 100644 index 00000000000..6cdf770ed1a --- /dev/null +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx @@ -0,0 +1,109 @@ +--- +navTitle: Timestamps +title: Timestamps in column-level conflict resolution +redirects: + - /pgd/latest/bdr/column-level-conflicts/ +--- + +When dealing with distributed databases, conflicts can happen. EDB's Postgres Distributed (PGD) provides mechanisms to handle these conflicts, one of which involves using timestamps. Understanding how timestamps work in this context is crucial for ensuring data consistency across nodes. + +## Current vs commit timestamp + +### Current timestamp + +When using `column_modify_timestamp` as the conflict detection method, the current timestamp is assigned to modified columns. This timestamp is similar to what is obtained from `clock_timestamp()`. + +#### Advantages + +- Simple to implement. +- Suitable when conflicting rows modify non-overlapping subsets of columns. + +#### Challenges + +- **Varying timestamps during execution:** Each row in an `UPDATE` statement may receive a slightly different timestamp, leading to potential mixed effects from concurrent changes on different nodes. +- **Clock Skew:** Differences in system clocks across nodes can cause unexpected behavior. +- **Serialization issues:** The current timestamp does not correlate with commit order, which can affect serializablity. + +### Commit timestamp + +Using `column_commit_timestamp` as the conflict detection method assigns the commit timestamp to the modified columns. This method aims to align more closely with the commit order of transactions. + +#### Future considerations: + +- Statement and transaction timestamps might be introduced to address the mixed effects of concurrent statements and transactions. However, these options will still not guarantee results equivalent to commit order. + +## Handling Timestamp Conflicts + +When using regular timestamps for ordering changes or commits, conflicts might arise if two or more nodes generate the same timestamp. In such cases, the node ID acts as a tie-breaker, with the higher node ID prevailing. This ensures consistent application of changes across all nodes. + +## Inspecting column timestamps + +The timestamps for modified columns are managed by triggers, and it is essential not to modify them directly. To investigate how a conflict was resolved, inspecting these timestamps can be helpful. + +### Useful functions + +#### `bdr.column_timestamps_to_text(bdr.column_timestamps)` + +Returns a human-readable representation of the timestamp mapping. + +```sql +SELECT cts::text FROM test_table; +``` + +Example output: + +``` +{source: current, default: 2018-09-23 19:24:52.118583+02, map: [2 : 2018-09-23 19:25:02.590677+02]} +``` + +#### `brd.column_timestamps_to_jsonb(bdr.column_timestamps)` + +Returns a JSONB representation of the timestamp mapping. + +```sql +SELECT jsonb_pretty(cts::jsonb) FROM test_table; +``` + +Example output: + +```json +{ + "map": { + "2": "2018-09-23T19:24:52.118583+02:00" + }, + "source": "current", + "default": "2018-09-23T19:24:52.118583+02:00" +} +``` + +#### `bdr.column_timestamps_resolve(brd.column_timestamps, xid) + +Updates the mapping with the commit timestamp for attributes modified by the most recent transaction, if it has already committed. + +```sql +SELECT bdr.column_timestamps_resolve(cts, xmin)::jsonb FROM test_table; +``` + +Example output: + +```json +{ + "map": { + "2": "2018-09-23T19:29:55.581823+02:00" + }, + "source": "commit", + "default": "2018-09-23T19:29:55.581823+02:00" +} +``` + +## Important considerations + +- **Automated Management:** The column storing timestamp mapping is managed automatically. Do not specify or override the value in your queries, as this can lead to unpredictable results. +- **Trigger order:** The order in which triggers execute is critical. Custom triggers modifying tuples after the pgl_clcd_ triggers may not detect modified columns correctly. + +By understanding these concepts, you can effectively manage column-level conflicts in EDB's Postgres Distributed (PGD) and ensure data consistency across your distributed database environment. + + + + + From f1cd7fc1175e3d4364e2e63a3ad914c9a52979bf Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Thu, 1 Aug 2024 14:35:01 -0400 Subject: [PATCH 21/31] Small change. --- .../pgd/5/consistency/column-level-conflicts/03_timestamps.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx index 6cdf770ed1a..a18d5e6de20 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx @@ -101,7 +101,7 @@ Example output: - **Automated Management:** The column storing timestamp mapping is managed automatically. Do not specify or override the value in your queries, as this can lead to unpredictable results. - **Trigger order:** The order in which triggers execute is critical. Custom triggers modifying tuples after the pgl_clcd_ triggers may not detect modified columns correctly. -By understanding these concepts, you can effectively manage column-level conflicts in EDB's Postgres Distributed (PGD) and ensure data consistency across your distributed database environment. +By understanding these concepts, you can effectively manage column-level conflicts in PGD and ensure data consistency across your distributed database environment. From 4e5be3fba7471e8bcb077cabdb0aa5e1150654d7 Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Thu, 1 Aug 2024 15:36:52 -0400 Subject: [PATCH 22/31] Integrated some of DJ's comments from Slack. --- .../5/consistency/column-level-conflicts/01_overview_clcd.mdx | 4 ++-- .../column-level-conflicts/02_enabling_disabling.mdx | 4 ++-- .../docs/pgd/5/consistency/column-level-conflicts/index.mdx | 4 +--- 3 files changed, 5 insertions(+), 7 deletions(-) diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx index 277c8177fbf..89c4b4ac055 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx @@ -35,7 +35,7 @@ Applied to the previous example, the result is `(100,100)` on both nodes, despit When thinking about column-level conflict resolution, it can be useful to see tables as vertically partitioned, so that each update affects data in only one slice. This approach eliminates conflicts between changes to different subsets of columns. In fact, vertical partitioning can even be a practical alternative to column-level conflict resolution. -Column-level conflict resolution requires the table to have `REPLICA IDENTITY FULL`. The [bdr.alter_table_conflict_detection()](../reference/conflict_functions#bdralter_table_conflict_detection) function checks that and fails with an error if this setting is missing. +Column-level conflict resolution requires the table to have `REPLICA IDENTITY FULL`. The [bdr.alter_table_conflict_detection()](../../reference/conflict_functions#bdralter_table_conflict_detection) function checks that and fails with an error if this setting is missing. By treating the columns independently, it's easy to violate constraints in a way that isn't possible when all changes happen on the same node. Consider, for example, a table like this: @@ -58,6 +58,6 @@ By treating the columns independently, it's easy to violate constraints in a way Each of those updates is valid when executed on the initial row and so passes on each node. But when replicating to the other node, the resulting row violates the `CHECK (A > b)` constraint, and the replication stops until the issue is resolved manually. -## Handling column-level conflicts using CRDT data types +### Handling column-level conflicts using CRDT data types By default, column-level conflict resolution picks the value with a higher timestamp and discards the other one. You can, however, reconcile the conflict in different, more elaborate ways. For example, you can use [CRDT types](../crdt) that allow merging the conflicting values without discarding any information. \ No newline at end of file diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx index 4f27cbce7f9..ce3f4fc2504 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx @@ -18,9 +18,9 @@ When enabling or disabling column timestamps on a table, the code uses DDL locki This approach ensures only conflicts with timestamps in both tuples or in neither of them are seen. Otherwise, the code might unexpectedly see timestamps in the local tuple and NULL in the remote one. It also ensures that the changes are resolved the same way (column-level or row-level) on all nodes. -### Example +### Example of enabling column-level conflict resolution using bdr.alter_table_conflict_detection -This example creates a table `test_table` and then enables column-level conflict resolution on it: +The [bdr.alter_table_conflict_detection](../../reference/conflict_functions/#bdralter_table_conflict_detection) function takes a table name and column name as its arguments. The column will be added to the table as a column_modify_timestamp column. The function also adds two triggers (BEFORE INSERT and BEFORE UPDATE) that are responsible for maintaining timestamps in the new column before each change. ```sql db=# CREATE TABLE my_app.test_table (id SERIAL PRIMARY KEY, val INT); diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx index eaad26db1d4..65105c7e91d 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx @@ -13,6 +13,4 @@ However, it might sometimes be appropriate to resolve conflicts at the column le * [Enabling and disabling](02_enabling_disabling) provides an example of enabling column-level conflict resolution and introduces [`bdr.column_timestamps_create`](02_enabling_disabling/#bdrcolumn_timestamps_create). -* [Timestamps](04_timestamps) explicates how timestamps can be selected and inspected. - -* [Handling column-level conflicts using CRDT data types](05_clc_crdt) notes how column-level conflict resolution can reconcile using DRDT types that allow merging conflicts. +* [Timestamps](03_timestamps) explicates how timestamps can be selected and inspected. From 1e85d76e33ed27ade7e7e608e8c79612018e4211 Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Fri, 2 Aug 2024 11:50:38 -0400 Subject: [PATCH 23/31] Added DJs rewrite of Timestamps in and deleted old Timestamps topic. --- .../column-level-conflicts/03_timestamps.mdx | 128 ++++++++---------- .../column-level-conflicts/04_timestamps.mdx | 97 ------------- 2 files changed, 58 insertions(+), 167 deletions(-) delete mode 100644 product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx index a18d5e6de20..8b9aa167952 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx @@ -5,105 +5,93 @@ redirects: - /pgd/latest/bdr/column-level-conflicts/ --- -When dealing with distributed databases, conflicts can happen. EDB's Postgres Distributed (PGD) provides mechanisms to handle these conflicts, one of which involves using timestamps. Understanding how timestamps work in this context is crucial for ensuring data consistency across nodes. +As previously mentioned, column-level conflict resolution depends on a timestamp column being included in the table. -## Current vs commit timestamp +## Comparing `column_modify_timestamp` and `column_commit_timestamp` -### Current timestamp +When you select one of the two column-level conflict detection methods, a column is added to the table which contains a mapping of modified columns and timestamps. -When using `column_modify_timestamp` as the conflict detection method, the current timestamp is assigned to modified columns. This timestamp is similar to what is obtained from `clock_timestamp()`. +The column storing timestamp mapping is managed automatically. Don't specify or override the value in your queries, as the results can be unpredictable. Where possible, user attempts to do so will be ignored. -#### Advantages +### `column_modify_timestamp` -- Simple to implement. -- Suitable when conflicting rows modify non-overlapping subsets of columns. +When `column_modify_timestamp` is selected as the conflict detection method, the timestamp assigned to the modified columns is the current timestamp, similar to the value you could get running `select_clock_timestamp()`. -#### Challenges +This is simple, and for many cases it is correct (for example, when the conflicting rows modify non-overlapping subsets of columns) -- **Varying timestamps during execution:** Each row in an `UPDATE` statement may receive a slightly different timestamp, leading to potential mixed effects from concurrent changes on different nodes. -- **Clock Skew:** Differences in system clocks across nodes can cause unexpected behavior. -- **Serialization issues:** The current timestamp does not correlate with commit order, which can affect serializablity. +Its simplicity can, though, lead to unexpected effects. -### Commit timestamp +For example, if an `UPDATE` affects multiple rows, the clock continues ticking while the `UPDATE` runs. So each row gets a slightly different timestamp, even if they are being modified concurrently by the one `UPDATE`. This, in turn, means that the effects of concurrent changes might get "mixed" in various ways, depending on how the changes performed on different nodes interleave. -Using `column_commit_timestamp` as the conflict detection method assigns the commit timestamp to the modified columns. This method aims to align more closely with the commit order of transactions. +Another possible issue is clock skew. Where the clocks on different nodes drift, so the timestamps drift. This clock skew can induce unexpected behavior such as newer changes being discarded because the timestamps are apparently switched around. However, you can manage clock skew between nodes using the parameters [bdr.maximum_clock_skew](https://www.enterprisedb.com/docs/pgd/latest/reference/pgd-settings/#bdrmaximum_clock_skew) and [bdr.maximum_clock_skew_action](https://www.enterprisedb.com/docs/pgd/latest/reference/pgd-settings/#bdrmaximum_clock_skew_action). -#### Future considerations: +As the current timestamp is unrelated to the commit timestamp, using it to resolve conflicts means that the result isn't equivalent to the commit order, which means it is probably cannot be serialized. -- Statement and transaction timestamps might be introduced to address the mixed effects of concurrent statements and transactions. However, these options will still not guarantee results equivalent to commit order. +When using current timestamps to order changes or commits, the conflicting changes might have exactly the same timestamp because two or more nodes happened to generate the same timestamp. This risk isn't unique to column-level conflict resolution, as it can happen even for regular row-level conflict resolution. The node id is used as the tiebreaker in this situation. The higher node id wins. This approach ensures that the same changes are applied on all nodes. -## Handling Timestamp Conflicts +### `column_commit_timestamp` -When using regular timestamps for ordering changes or commits, conflicts might arise if two or more nodes generate the same timestamp. In such cases, the node ID acts as a tie-breaker, with the higher node ID prevailing. This ensures consistent application of changes across all nodes. +You can also use the actual commit timestamp, specified with `column_commit_timestamp` as the conflict detection method. This has the advantage of using the commit time, which will be the same for all changes made within an `UPDATE`. -## Inspecting column timestamps +!!! Note +Statement transactions might be added in the future, which will address issues with mixing effects of concurrent statements or transactions. Still, neither of these options can ever produce results equivalent to commit order. +!!! -The timestamps for modified columns are managed by triggers, and it is essential not to modify them directly. To investigate how a conflict was resolved, inspecting these timestamps can be helpful. +## Inspecting column timestamps -### Useful functions +The column storing timestamps for modified columns is maintained by triggers. Don't modify it directly. It can be useful to inspect the current timestamps value, for example, while investigating how a conflict was resolved. -#### `bdr.column_timestamps_to_text(bdr.column_timestamps)` - -Returns a human-readable representation of the timestamp mapping. - -```sql -SELECT cts::text FROM test_table; -``` +!!! Note +The timestamp mapping is maintained by triggers and the order in which triggers execute matters. So if you have custom triggers that modify tuples and are executed after the pgl_clcd_ triggers, the modified columns aren't detected correctly. +!!! -Example output: +These functions are useful for inspecting timestamps: -``` -{source: current, default: 2018-09-23 19:24:52.118583+02, map: [2 : 2018-09-23 19:25:02.590677+02]} -``` - -#### `brd.column_timestamps_to_jsonb(bdr.column_timestamps)` +#### `bdr.column_timestamps_to_text(bdr.column_timestamps)` -Returns a JSONB representation of the timestamp mapping. +This function returns a human-readable representation of the timestamp mapping and is used when casting the value to text: ```sql -SELECT jsonb_pretty(cts::jsonb) FROM test_table; -``` +db=# select cts::text from test_table; + cts +----------------------------------------------------------------------------------------------------- + {source: current, default: 2018-09-23 19:24:52.118583+02, map: [2 : 2018-09-23 19:25:02.590677+02]} +(1 row) -Example output: - -```json -{ - "map": { - "2": "2018-09-23T19:24:52.118583+02:00" - }, - "source": "current", - "default": "2018-09-23T19:24:52.118583+02:00" -} ``` -#### `bdr.column_timestamps_resolve(brd.column_timestamps, xid) +#### `bdr.column_timestamps_to_jsonb(bdr.column_timestamps)` -Updates the mapping with the commit timestamp for attributes modified by the most recent transaction, if it has already committed. +This function turns a JSONB representation of the timestamps mapping and is used when casting the value to jsonb: ```sql -SELECT bdr.column_timestamps_resolve(cts, xmin)::jsonb FROM test_table; +db=# select jsonb_pretty(cts::jsonb) from test_table; + jsonb_pretty +--------------------------------------------------- + { + + "map": { + + "2": "2018-09-23T19:24:52.118583+02:00" + + }, + + "source": "current", + + "default": "2018-09-23T19:24:52.118583+02:00"+ + } +(1 row) ``` -Example output: - -```json -{ - "map": { - "2": "2018-09-23T19:29:55.581823+02:00" - }, - "source": "commit", - "default": "2018-09-23T19:29:55.581823+02:00" -} -``` - -## Important considerations - -- **Automated Management:** The column storing timestamp mapping is managed automatically. Do not specify or override the value in your queries, as this can lead to unpredictable results. -- **Trigger order:** The order in which triggers execute is critical. Custom triggers modifying tuples after the pgl_clcd_ triggers may not detect modified columns correctly. - -By understanding these concepts, you can effectively manage column-level conflicts in PGD and ensure data consistency across your distributed database environment. - - - +#### `bdr.column_timestamps_resolve(bdr.column_timestamps, xid)` +This function updates the mapping with the commit timestamp for the attributes modified by the most recent transaction if it already committed. This matters only when using the commit timestamp. For example, in this case, the last transaction updated the second attribute (with attnum = 2): +```sql +test=# select cts::jsonb from test_table; + cts +---------------------------------------------------------------------------------------------------------------------------------------- + {"map": {"2": "2018-09-23T19:29:55.581823+02:00"}, "source": "commit", "default": "2018-09-23T19:29:55.581823+02:00", "modified": [2]} +(1 row) + +db=# select bdr.column_timestamps_resolve(cts, xmin)::jsonb from test_table; + column_timestamps_resolve +----------------------------------------------------------------------------------------------------------------------- + {"map": {"2": "2018-09-23T19:29:55.581823+02:00"}, "source": "commit", "default": "2018-09-23T19:29:55.581823+02:00"} +(1 row) +``` \ No newline at end of file diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx deleted file mode 100644 index f9c57cb1fd0..00000000000 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/04_timestamps.mdx +++ /dev/null @@ -1,97 +0,0 @@ ---- -navTitle: Timestamps -title: Timestamps -redirects: - - /pgd/latest/bdr/column-level-conflicts/ ---- - -## Current versus commit timestamp - -An important decision is which timestamp to assign to modified columns. - -!!! Note -The column storing timestamp mapping is managed automatically. Don't specify or override the value in your queries, as the results can be unpredictable. (The value is ignored where possible.) -!!! - -If `column_modify_timestamp` is selected as the conflict detection method, the timestamp assigned to modified columns is the current timestamp, as if obtained from `clock_timestamp`. This is simple, and for many cases it is correct (for example, when the conflicting rows modify non-overlapping subsets of columns). - -It can, however, have various unexpected effects: - -- The timestamp changes during statement execution. So, if an `UPDATE` affects multiple rows, each gets a slightly different timestamp. This means that the effects of concurrent changes might get "mixed" in various ways, depending on how the changes performed on different nodes interleave. - -!!! Note -A clock skew can occur between different nodes. It can induce somewhat unexpected behavior, discarding seemingly newer changes because the timestamps are inverted. However, you can manage clock skew between nodes using the parameters `bdr.maximum_clock_skew` and `bdr.maximum_clock_skew_action`. -!!! - -- The timestamp is unrelated to the commit timestamp. Using it to resolve conflicts means that the result isn't equivalent to the commit order, which means it likely can't be serialized. - -You can also use the actual commit timestamp, specified with `column_commit_timestamp` as the conflict detection method. - -!!! Note -Statement and transaction timestamps might be added in the future, which will address issues with mixing effects of concurrent statements or transactions. Still, neither of these options can ever produce results equivalent to commit order. -!!! - -!!! Note -When using regular timestamps to order changes or commits, the conflicting changes might have exactly the same timestamp because two or more nodes happened to generate the same timestamp. This risk isn't unique to column-level conflict resolution, as it can happen even for regular row-level conflict resolution. The node id is the tie breaker in this situation. The higher node id wins. This approach ensures that the same changes are applied on all nodes. -!!! - -## Inspecting column timestamps - -The column storing timestamps for modified columns is maintained by triggers. Don't modify it directly. It can be useful to inspect the current timestamps value, for example, while investigating how a conflict was resolved. - -!!! Note -The timestamp mapping is maintained by triggers and the order in which triggers execute matters. So if you have custom triggers that modify tuples and are executed after the `pgl_clcd_` triggers, the modified columns aren't detected correctly. -!!! - -Three functions are useful for this purpose: - -- `bdr.column_timestamps_to_text(bdr.column_timestamps)` - - This function returns a human-readable representation of the timestamp mapping and - is used when casting the value to `text`: - -```sql -db=# select cts::text from test_table; - cts ------------------------------------------------------------------------------------------------------ - {source: current, default: 2018-09-23 19:24:52.118583+02, map: [2 : 2018-09-23 19:25:02.590677+02]} -(1 row) - -``` - -- `bdr.column_timestamps_to_jsonb(bdr.column_timestamps)` - - This function turns a JSONB representation of the timestamps mapping and is used - when casting the value to `jsonb`: - -```sql -db=# select jsonb_pretty(cts::jsonb) from test_table; - jsonb_pretty ---------------------------------------------------- - { + - "map": { + - "2": "2018-09-23T19:24:52.118583+02:00" + - }, + - "source": "current", + - "default": "2018-09-23T19:24:52.118583+02:00"+ - } -(1 row) -``` - -- `bdr.column_timestamps_resolve(bdr.column_timestamps, xid)` - - This function updates the mapping with the commit timestamp for the attributes modified by the most recent transaction if it already committed. This matters only when using the commit timestamp. For example, in this case, the last transaction updated the second attribute (with `attnum = 2`): - -```sql -test=# select cts::jsonb from test_table; - cts ----------------------------------------------------------------------------------------------------------------------------------------- - {"map": {"2": "2018-09-23T19:29:55.581823+02:00"}, "source": "commit", "default": "2018-09-23T19:29:55.581823+02:00", "modified": [2]} -(1 row) - -db=# select bdr.column_timestamps_resolve(cts, xmin)::jsonb from test_table; - column_timestamps_resolve ------------------------------------------------------------------------------------------------------------------------ - {"map": {"2": "2018-09-23T19:29:55.581823+02:00"}, "source": "commit", "default": "2018-09-23T19:29:55.581823+02:00"} -(1 row) -``` From ab9b4e258ea0c4660996cd064c5fd72518eb32f2 Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Fri, 2 Aug 2024 13:51:36 -0400 Subject: [PATCH 24/31] More tweaks to align with DJ's feedback. --- .../column-level-conflicts/01_overview_clcd.mdx | 10 +++++++--- .../02_enabling_disabling.mdx | 14 +++----------- .../column-level-conflicts/03_timestamps.mdx | 6 +++++- .../5/consistency/column-level-conflicts/index.mdx | 4 ++-- 4 files changed, 17 insertions(+), 17 deletions(-) diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx index 89c4b4ac055..a9faa238647 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx @@ -9,6 +9,8 @@ By default, conflicts are resolved at row level. When changes from two nodes con However, it might sometimes be appropriate to resolve conflicts at the column level rather than the row level, at least in some cases. +## When to resolve at the column level + Consider a simple example in which table t has two integer columns, a and b, and a single row `(1,1)`. On one node execute: ```sql @@ -25,7 +27,7 @@ UPDATE t SET b = 100 The attributes modified by an `UPDATE` are determined by comparing the old and new row in a trigger. This means that if the attribute doesn't change a value, it isn't detected as modified even if it's explicitly set. For example, `UPDATE t SET a = a` doesn't mark `a` as modified for any row. Similarly, `UPDATE t SET a = 1` doesn't mark `a` as modified for rows that are already set to `1`. !!! -This sequence results in an `UPDATE-UPDATE` conflict. With the `update_if_newer` conflict resolution, the commit timestamps are compared, and the new row version is kept. Assuming the second node committed last, the result is `(1,100)`, which effectively discards the change to column a. +This sequence results in an `UPDATE-UPDATE` conflict. With the [`update_if_newer`](../../reference/conflicts/#default-conflict-resolvers) conflict resolution, the commit timestamps are compared, and the new row version is kept. Assuming the second node committed last, the result is `(1,100)`, which effectively discards the change to column a. For many use cases, this behavior is desired and expected. However, for some use cases, this might be an issue. Consider, for example, a multi-node cluster where each part of the application is connected to a different node, updating a dedicated subset of columns in a shared table. In that case, the different components might conflict and overwrite changes. @@ -37,6 +39,8 @@ When thinking about column-level conflict resolution, it can be useful to see ta Column-level conflict resolution requires the table to have `REPLICA IDENTITY FULL`. The [bdr.alter_table_conflict_detection()](../../reference/conflict_functions#bdralter_table_conflict_detection) function checks that and fails with an error if this setting is missing. +## Special problems for column-level conflict resolution + By treating the columns independently, it's easy to violate constraints in a way that isn't possible when all changes happen on the same node. Consider, for example, a table like this: ```sql @@ -56,8 +60,8 @@ By treating the columns independently, it's easy to violate constraints in a way UPDATE t SET b = 500; ``` -Each of those updates is valid when executed on the initial row and so passes on each node. But when replicating to the other node, the resulting row violates the `CHECK (A > b)` constraint, and the replication stops until the issue is resolved manually. +Each of those updates is valid when executed on the initial row and so passes on each node. But when replicating to the other node, the resulting row violates the `CHECK (a > b)` constraint, and the replication stops until the issue is resolved manually. -### Handling column-level conflicts using CRDT data types +## Handling column-level conflicts using CRDT data types By default, column-level conflict resolution picks the value with a higher timestamp and discards the other one. You can, however, reconcile the conflict in different, more elaborate ways. For example, you can use [CRDT types](../crdt) that allow merging the conflicting values without discarding any information. \ No newline at end of file diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx index ce3f4fc2504..a366aa50904 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx @@ -12,20 +12,14 @@ Column-level conflict detection uses the `column_timestamps` type. This type req The [bdr.alter_table_conflict_detection()](../../reference/conflict_functions/#bdralter_table_conflict_detection) function manages column-level conflict resolution. -### DDL locking and managing timestamps - -When enabling or disabling column timestamps on a table, the code uses DDL locking to ensure that there are no pending changes from before the switch. - -This approach ensures only conflicts with timestamps in both tuples or in neither of them are seen. Otherwise, the code might unexpectedly see timestamps in the local tuple and NULL in the remote one. It also ensures that the changes are resolved the same way (column-level or row-level) on all nodes. - -### Example of enabling column-level conflict resolution using bdr.alter_table_conflict_detection +## Example of enabling column-level conflict resolution using bdr.alter_table_conflict_detection The [bdr.alter_table_conflict_detection](../../reference/conflict_functions/#bdralter_table_conflict_detection) function takes a table name and column name as its arguments. The column will be added to the table as a column_modify_timestamp column. The function also adds two triggers (BEFORE INSERT and BEFORE UPDATE) that are responsible for maintaining timestamps in the new column before each change. ```sql db=# CREATE TABLE my_app.test_table (id SERIAL PRIMARY KEY, val INT); CREATE TABLE - + db=# ALTER TABLE my_app.test_table REPLICA IDENTITY FULL; ALTER TABLE @@ -58,7 +52,7 @@ The new column specifies `NOT NULL` with a default value, which means that `ALTE Avoid using columns with the `bdr.column_timestamps` data type for other purposes, as doing so can have negative effects. For example, it switches the table to column-level conflict resolution, which doesn't work correctly without the triggers. !!! -### Listing tables with column-level conflict resolution +## Listing tables with column-level conflict resolution You can list tables having column-level conflict resolution enabled with the following query. This query detects the presence of a column of type `bdr.column_timestamp`. @@ -76,5 +70,3 @@ WHERE NOT pg_is_other_temp_schema(nc.oid) AND c.relkind IN ('r', 'v', 'f', 'p'); ``` - - diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx index 8b9aa167952..6b9e0c00c7a 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx @@ -13,6 +13,10 @@ When you select one of the two column-level conflict detection methods, a column The column storing timestamp mapping is managed automatically. Don't specify or override the value in your queries, as the results can be unpredictable. Where possible, user attempts to do so will be ignored. +When enabling or disabling column timestamps on a table, the code uses DDL locking to ensure that there are no pending changes from before the switch. + +This approach ensures only conflicts with timestamps in both tuples or in neither of them are seen. Otherwise, the code might unexpectedly see timestamps in the local tuple and NULL in the remote one. It also ensures that the changes are resolved the same way (column-level or row-level) on all nodes. + ### `column_modify_timestamp` When `column_modify_timestamp` is selected as the conflict detection method, the timestamp assigned to the modified columns is the current timestamp, similar to the value you could get running `select_clock_timestamp()`. @@ -27,7 +31,7 @@ Another possible issue is clock skew. Where the clocks on different nodes drift, As the current timestamp is unrelated to the commit timestamp, using it to resolve conflicts means that the result isn't equivalent to the commit order, which means it is probably cannot be serialized. -When using current timestamps to order changes or commits, the conflicting changes might have exactly the same timestamp because two or more nodes happened to generate the same timestamp. This risk isn't unique to column-level conflict resolution, as it can happen even for regular row-level conflict resolution. The node id is used as the tiebreaker in this situation. The higher node id wins. This approach ensures that the same changes are applied on all nodes. +When using current timestamps to order changes or commits, the conflicting changes might have exactly the same timestamp because two or more nodes happened to generate the same timestamp. This risk isn't unique to column-level conflict resolution, as it can happen even for regular row-level conflict resolution. The node id is used as the tiebreaker in this situation. The higher node id wins. This approach ensures that the same changes are applied on all nodes. ### `column_commit_timestamp` diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx index 65105c7e91d..7a4c88c8684 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx @@ -9,8 +9,8 @@ By default, conflicts are resolved at row level. When changes from two nodes con However, it might sometimes be appropriate to resolve conflicts at the column level rather than the row level, at least in some cases. -* [Overview](01_overview_clcd) introduces the notion of a column-level conflict in contrast to row-level conflicts. +* [Overview](01_overview_clcd) introduces column-level conflict resolution in contrast to row-level conflict resolution. * [Enabling and disabling](02_enabling_disabling) provides an example of enabling column-level conflict resolution and introduces [`bdr.column_timestamps_create`](02_enabling_disabling/#bdrcolumn_timestamps_create). -* [Timestamps](03_timestamps) explicates how timestamps can be selected and inspected. +* [Timestamps](03_timestamps) explicates the difference between using `column_modify_timestamp` and `column_commit_timestamp` and shows how the timestamps associated with column-level conflict resolution can be selected and inspected. From 2de40ea7c3cd5e6321caddd3e6d3cd5b75f1e8bb Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Fri, 2 Aug 2024 14:20:14 -0400 Subject: [PATCH 25/31] More little changes. --- .../column-level-conflicts/01_overview_clcd.mdx | 2 +- .../column-level-conflicts/02_enabling_disabling.mdx | 8 +++----- .../consistency/column-level-conflicts/03_timestamps.mdx | 2 +- 3 files changed, 5 insertions(+), 7 deletions(-) diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx index a9faa238647..9d032f3df8f 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx @@ -37,7 +37,7 @@ Applied to the previous example, the result is `(100,100)` on both nodes, despit When thinking about column-level conflict resolution, it can be useful to see tables as vertically partitioned, so that each update affects data in only one slice. This approach eliminates conflicts between changes to different subsets of columns. In fact, vertical partitioning can even be a practical alternative to column-level conflict resolution. -Column-level conflict resolution requires the table to have `REPLICA IDENTITY FULL`. The [bdr.alter_table_conflict_detection()](../../reference/conflict_functions#bdralter_table_conflict_detection) function checks that and fails with an error if this setting is missing. +Column-level conflict resolution requires the table to have `REPLICA IDENTITY FULL`. The [bdr.alter_table_conflict_detection()](https://www.enterprisedb.com/docs/pgd/latest/reference/conflict_functions#bdralter_table_conflict_detection) function checks that and fails with an error if this setting is missing. ## Special problems for column-level conflict resolution diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx index a366aa50904..f284f333573 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx @@ -10,11 +10,11 @@ redirects: Column-level conflict detection uses the `column_timestamps` type. This type requires any user needing to detect column-level conflicts to have at least the [bdr_application](../../security/pgd-predefined-roles/#bdr_application) role assigned. !!! -The [bdr.alter_table_conflict_detection()](../../reference/conflict_functions/#bdralter_table_conflict_detection) function manages column-level conflict resolution. +The [bdr.alter_table_conflict_detection()](https://www.enterprisedb.com/docs/pgd/latest/reference/conflict_functions/#bdralter_table_conflict_detection) function manages column-level conflict resolution. -## Example of enabling column-level conflict resolution using bdr.alter_table_conflict_detection +## Using bdr.alter_table_conflict_detection to enable column-level conflict resolution -The [bdr.alter_table_conflict_detection](../../reference/conflict_functions/#bdralter_table_conflict_detection) function takes a table name and column name as its arguments. The column will be added to the table as a column_modify_timestamp column. The function also adds two triggers (BEFORE INSERT and BEFORE UPDATE) that are responsible for maintaining timestamps in the new column before each change. +The [bdr.alter_table_conflict_detection](https://www.enterprisedb.com/docs/pgd/latest/reference/conflict_functions/#bdralter_table_conflict_detection) function takes a table name and column name as its arguments. The column is added to the table as a `column_modify_timestamp` column. The function also adds two triggers (BEFORE INSERT and BEFORE UPDATE) that are responsible for maintaining timestamps in the new column before each change. ```sql db=# CREATE TABLE my_app.test_table (id SERIAL PRIMARY KEY, val INT); @@ -44,8 +44,6 @@ Triggers: bdr_clcd_before_update BEFORE UPDATE ON my_app.test_table FOR EACH ROW EXECUTE FUNCTION bdr.column_timestamps_current_update() ``` -The function adds a `cts` column as specified in the function call. It also creates two triggers (`BEFORE INSERT` and `BEFORE UPDATE`) that are responsible for maintaining timestamps in the new column before each change. - The new column specifies `NOT NULL` with a default value, which means that `ALTER TABLE ... ADD COLUMN` doesn't perform a table rewrite. !!! Note diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx index 6b9e0c00c7a..844ca2eba89 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx @@ -21,7 +21,7 @@ This approach ensures only conflicts with timestamps in both tuples or in neithe When `column_modify_timestamp` is selected as the conflict detection method, the timestamp assigned to the modified columns is the current timestamp, similar to the value you could get running `select_clock_timestamp()`. -This is simple, and for many cases it is correct (for example, when the conflicting rows modify non-overlapping subsets of columns) +This is simple, and for many cases, it is correct (for example, when the conflicting rows modify non-overlapping subsets of columns) Its simplicity can, though, lead to unexpected effects. From ea6d54c3cba0f8bdbf478852f9e990334a1b6763 Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Fri, 2 Aug 2024 14:52:36 -0400 Subject: [PATCH 26/31] Final tweaks and removing original clc mdx file. --- .../5/consistency/column-level-conflicts.mdx | 239 ------------------ .../02_enabling_disabling.mdx | 3 +- .../column-level-conflicts/03_timestamps.mdx | 4 +- .../column-level-conflicts/index.mdx | 4 +- 4 files changed, 5 insertions(+), 245 deletions(-) delete mode 100644 product_docs/docs/pgd/5/consistency/column-level-conflicts.mdx diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts.mdx deleted file mode 100644 index 6fe4df7f3f0..00000000000 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts.mdx +++ /dev/null @@ -1,239 +0,0 @@ ---- -navTitle: Column-level conflict resolution -title: Column-level conflict detection -redirects: - - /pgd/latest/bdr/column-level-conflicts/ ---- - -By default, conflicts are resolved at row level. When changes from two nodes conflict, either the local or remote tuple is selected and the other is discarded. For example, commit timestamps for the two conflicting changes might be compared and the newer one kept. This approach ensures that all nodes converge to the same result and establishes commit-order-like semantics on the whole cluster. - -However, in some cases it might be appropriate to resolve conflicts at the column level rather than the row level. - -Consider a simple example, in which table t has two integer columns a and b and a single row `(1,1)`. On one node execute: - -```sql -UPDATE t SET a = 100 -``` - -On another node, before receiving the preceding `UPDATE`, concurrently execute: - -```sql -UPDATE t SET b = 100 -``` - -This sequence results in an `UPDATE-UPDATE` conflict. With the `update_if_newer` conflict resolution, the commit timestamps are compared, and the new row version is kept. Assuming the second node committed last, the result is `(1,100)`, which effectively discards the change to column a. - -For many use cases, this behavior is the desired and expected. However, for some use cases, this might be an issue. Consider, for example, a multi-node cluster where each part of the application is connected to a different node, updating a dedicated subset of columns in a shared table. In that case, the different components might conflict and overwrite changes. - -For such use cases, it might be more appropriate to resolve conflicts on a given table at the column level. To achieve that, PGD tracks the timestamp of the last change for each column separately and uses that to pick the most recent value, essentially performing `update_if_newer`. - -Applied to the previous example, the result is `(100,100)` on both nodes, despite neither of the nodes ever seeing such a row. - -When thinking about column-level conflict resolution, it can be useful to see tables as vertically partitioned, so that each update affects data in only one slice. This approach eliminates conflicts between changes to different subsets of columns. In fact, vertical partitioning can even be a practical alternative to column-level conflict resolution. - -Column-level conflict resolution requires the table to have `REPLICA IDENTITY FULL`. The [bdr.alter_table_conflict_detection()](../reference/conflict_functions#bdralter_table_conflict_detection) function checks that and fails with an error if this setting is missing. - -## Enabling and disabling column-level conflict resolution - -!!! Note Permissions required -Column-level conflict detection uses the `column_timestamps` type. This type requires any user needing to detect column-level conflicts to have at least the [bdr_application](../security/pgd-predefined-roles/#bdr_application) role assigned. -!!! - -The [bdr.alter_table_conflict_detection()](../reference/conflict_functions/#bdralter_table_conflict_detection) function manages column-level conflict resolution. - -### Example - -This example creates a table `test_table` and then enables column-level conflict resolution on it: - -```sql -db=# CREATE TABLE my_app.test_table (id SERIAL PRIMARY KEY, val INT); -CREATE TABLE - -db=# ALTER TABLE my_app.test_table REPLICA IDENTITY FULL; -ALTER TABLE - -db=# SELECT bdr.alter_table_conflict_detection( -db(# 'my_app.test_table'::regclass, -db(# 'column_modify_timestamp', 'cts'); - alter_table_conflict_detection --------------------------------- - t - -db=# \d my_app.test_table - Table "my_app.test_table" - Column | Type | Collation | Nullable | Default ---------+-----------------------+-----------+----------+-------------------------------------------------- - id | integer | | not null | nextval('my_app.test_table_id_seq'::regclass) - val | integer | | | - cts | bdr.column_timestamps | | not null | 's 1 775297963454602 0 0'::bdr.column_timestamps -Indexes: - "test_table_pkey" PRIMARY KEY, btree (id) -Triggers: - bdr_clcd_before_insert BEFORE INSERT ON my_app.test_table FOR EACH ROW EXECUTE FUNCTION bdr.column_timestamps_current_insert() - bdr_clcd_before_update BEFORE UPDATE ON my_app.test_table FOR EACH ROW EXECUTE FUNCTION bdr.column_timestamps_current_update() -``` - -The function adds a `cts` column as specified in the function call. It also creates two triggers (`BEFORE INSERT` and `BEFORE UPDATE`) that are responsible for maintaining timestamps in the new column before each change. - -The new column specifies `NOT NULL` with a default value, which means that `ALTER TABLE ... ADD COLUMN` doesn't perform a table rewrite. - -!!! Note - Avoid using columns with the `bdr.column_timestamps` data type for other purposes, as doing so can have negative effects. For example, it switches the table to column-level conflict resolution, which doesn't work correctly without the triggers. - -### Listing table with column-level conflict resolution - -You can list tables having column-level conflict resolution enabled with the following query. This query detects the presence of a column of type `bdr.column_timestamp`. - -```sql -SELECT nc.nspname, c.relname -FROM pg_attribute a -JOIN (pg_class c JOIN pg_namespace nc ON c.relnamespace = nc.oid) - ON a.attrelid = c.oid -JOIN (pg_type t JOIN pg_namespace nt ON t.typnamespace = nt.oid) - ON a.atttypid = t.oid -WHERE NOT pg_is_other_temp_schema(nc.oid) - AND nt.nspname = 'bdr' - AND t.typname = 'column_timestamps' - AND NOT a.attisdropped - AND c.relkind IN ('r', 'v', 'f', 'p'); -``` - -### bdr.column_timestamps_create - -This function creates column-level conflict resolution. It's called within `column_timestamp_enable`. - -#### Synopsis - -```sql -bdr.column_timestamps_create(p_source cstring, p_timestamp timestampstz) -``` - -#### Parameters - -- `p_source` — The two options are `current` or `commit`. -- `p_timestamp` — Timestamp depends on the source chosen. If `commit`, then `TIMESTAMP_SOURCE_COMMIT`. If `current`, then `TIMESTAMP_SOURCE_CURRENT`. - -## DDL locking - -When enabling or disabling column timestamps on a table, the code uses DDL locking to ensure that there are no pending changes from before the switch. This approach ensures only conflicts with timestamps in both tuples or in neither of them are seen. Otherwise, the code might unexpectedly see timestamps in the local tuple and NULL in the remote one. It also ensures that the changes are resolved the same way (column-level or row-level) on all nodes. - -## Current versus commit timestamp - -An important decision is the timestamp to assign to modified columns. - -If `column_modify_timestamp` is selected as the conflict detection method, the timestamp assigned to modified columns is the current timestamp, as if obtained from `clock_timestamp`. This is simple, and for many cases it is correct (for example, when the conflicting rows modify non-overlapping subsets of columns). - -It can, however, have various unexpected effects: - -- The timestamp changes during statement execution. So, if an `UPDATE` affects multiple rows, each gets a slightly different timestamp. This means that the effects of concurrent changes might get "mixed" in various ways, depending on how the changes performed on different nodes interleave. - -- The timestamp is unrelated to the commit timestamp. Using it to resolve conflicts means that the result isn't equivalent to the commit order, which means it likely can't be serialized. - -You can also use the actual commit timestamp, specified with `column_commit_timestamp` as the conflict detection method. -Commit timestamps currently have restrictions that are explained in [Notes](#notes). - -!!! Note - Statement and transaction timestamps might be added in the future, which will address issues with mixing effects of concurrent statements or transactions. Still, neither of these options can ever produce results equivalent to commit order. - -## Inspecting column timestamps - -The column storing timestamps for modified columns is maintained by triggers. Don't modify it directly. It can be useful to inspect the current timestamps value, for example, while investigating how a conflict was resolved. - -Three functions are useful for this purpose: - -- `bdr.column_timestamps_to_text(bdr.column_timestamps)` - - This function returns a human-readable representation of the timestamp mapping and - is used when casting the value to `text`: - -```sql -db=# select cts::text from test_table; - cts ------------------------------------------------------------------------------------------------------ - {source: current, default: 2018-09-23 19:24:52.118583+02, map: [2 : 2018-09-23 19:25:02.590677+02]} -(1 row) - -``` - -- `bdr.column_timestamps_to_jsonb(bdr.column_timestamps)` - - This function turns a JSONB representation of the timestamps mapping and is used - when casting the value to `jsonb`: - -```sql -db=# select jsonb_pretty(cts::jsonb) from test_table; - jsonb_pretty ---------------------------------------------------- - { + - "map": { + - "2": "2018-09-23T19:24:52.118583+02:00" + - }, + - "source": "current", + - "default": "2018-09-23T19:24:52.118583+02:00"+ - } -(1 row) -``` - -- `bdr.column_timestamps_resolve(bdr.column_timestamps, xid)` - - This function updates the mapping with the commit timestamp for the attributes modified by the most recent transaction if it already committed. This matters only when using the commit timestamp. For example, in this case, the last transaction updated the second attribute (with `attnum = 2`): - -```sql -test=# select cts::jsonb from test_table; - cts ----------------------------------------------------------------------------------------------------------------------------------------- - {"map": {"2": "2018-09-23T19:29:55.581823+02:00"}, "source": "commit", "default": "2018-09-23T19:29:55.581823+02:00", "modified": [2]} -(1 row) - -db=# select bdr.column_timestamps_resolve(cts, xmin)::jsonb from test_table; - column_timestamps_resolve ------------------------------------------------------------------------------------------------------------------------ - {"map": {"2": "2018-09-23T19:29:55.581823+02:00"}, "source": "commit", "default": "2018-09-23T19:29:55.581823+02:00"} -(1 row) -``` - -## Handling column conflicts using CRDT data types - -By default, column-level conflict resolution picks the value with a higher timestamp and discards the other one. You can, however, reconcile the conflict in different, more elaborate ways. For example, you can use CRDT types that allow merging the conflicting values without discarding any information. - -## Notes - -- The attributes modified by an `UPDATE` are determined by comparing the old and new row in a trigger. This means that if the attribute doesn't change a value, it isn't detected as modified even if it's explicitly set. For example, `UPDATE t SET a = a` doesn't mark `a` as modified for any row. Similarly, `UPDATE t SET a = 1` doesn't mark `a` as modified for rows that are already set to `1`. - -- For `INSERT` statements, there's no old row to compare the new one to, so all attributes are considered to be modified, and they are assigned a new timestamp. This condition applies even for columns that weren't included in the `INSERT` statement and received default values. PGD can detect the attributes that have a default value but can't know if it was included automatically or specified explicitly. - - This situation effectively means column-level conflict resolution doesn't work for `INSERT-INSERT` conflicts even if the `INSERT` statements specify different subsets of columns. The newer row has timestamps that are all newer than the older row. - -- By treating the columns independently, it's easy to violate constraints in a way that isn't possible when all changes happen on the same node. Consider, for example, a table like this: - - ```sql - CREATE TABLE t (id INT PRIMARY KEY, a INT, b INT, CHECK (a > b)); - INSERT INTO t VALUES (1, 1000, 1); - ``` - - Assume one node does: - - ```sql - UPDATE t SET a = 100; - ``` - - Another node concurrently does: - - ```sql - UPDATE t SET b = 500; - ``` - - Each of those updates is valid when executed on the initial row and so passes on each node. But when replicating to the other node, the resulting row violates the `CHECK (A > b)` constraint, and the replication stops until the issue is resolved manually. - -- The column storing timestamp mapping is managed automatically. Don't specify or override the value in your queries, as the results can be unpredictable. (The value is ignored where possible.) - -- The timestamp mapping is maintained by triggers, but the order in which triggers execute matters. So if you have custom triggers that modify tuples and are executed after the `pgl_clcd_` triggers, the modified columns aren't detected correctly. - -- When using regular timestamps to order changes or commits, the conflicting changes might have exactly the same timestamp because two or more nodes happened to generate the same timestamp. This risk isn't unique to column-level conflict resolution, as it can happen even for regular row-level conflict resolution. The node id is the tie breaker in this situation. The higher node id wins. This approach ensures that the same changes are applied on all nodes. - -- A clock skew can occur between different nodes. It can induce somewhat unexpected behavior, discarding seemingly newer changes because the timestamps are inverted. However, you can manage clock skew between nodes using the parameters `bdr.maximum_clock_skew` and `bdr.maximum_clock_skew_action`. - -```sql -SELECT bdr.alter_node_group_config('group', ignore_redundant_updates := false); -``` - diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx index f284f333573..131715dc1ad 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx @@ -52,7 +52,7 @@ Avoid using columns with the `bdr.column_timestamps` data type for other purpose ## Listing tables with column-level conflict resolution -You can list tables having column-level conflict resolution enabled with the following query. This query detects the presence of a column of type `bdr.column_timestamp`. +You can list tables having column-level conflict resolution enabled with the following query. ```sql SELECT nc.nspname, c.relname @@ -68,3 +68,4 @@ WHERE NOT pg_is_other_temp_schema(nc.oid) AND c.relkind IN ('r', 'v', 'f', 'p'); ``` +This query detects the presence of a column of type `bdr.column_timestamp`. diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx index 844ca2eba89..6b7383872f7 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx @@ -13,9 +13,7 @@ When you select one of the two column-level conflict detection methods, a column The column storing timestamp mapping is managed automatically. Don't specify or override the value in your queries, as the results can be unpredictable. Where possible, user attempts to do so will be ignored. -When enabling or disabling column timestamps on a table, the code uses DDL locking to ensure that there are no pending changes from before the switch. - -This approach ensures only conflicts with timestamps in both tuples or in neither of them are seen. Otherwise, the code might unexpectedly see timestamps in the local tuple and NULL in the remote one. It also ensures that the changes are resolved the same way (column-level or row-level) on all nodes. +When enabling or disabling column timestamps on a table, the code uses DDL locking to ensure that there are no pending changes from before the switch. This approach ensures only conflicts with timestamps in both tuples or in neither of them are seen. Otherwise, the code might unexpectedly see timestamps in the local tuple and NULL in the remote one. It also ensures that the changes are resolved the same way (column-level or row-level) on all nodes. ### `column_modify_timestamp` diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx index 7a4c88c8684..5d379171bae 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/index.mdx @@ -9,8 +9,8 @@ By default, conflicts are resolved at row level. When changes from two nodes con However, it might sometimes be appropriate to resolve conflicts at the column level rather than the row level, at least in some cases. -* [Overview](01_overview_clcd) introduces column-level conflict resolution in contrast to row-level conflict resolution. +* [Overview](01_overview_clcd) introduces column-level conflict resolution in contrast to row-level conflict resolution, suggesting where it might be a better fit than row-level conflict resolution. -* [Enabling and disabling](02_enabling_disabling) provides an example of enabling column-level conflict resolution and introduces [`bdr.column_timestamps_create`](02_enabling_disabling/#bdrcolumn_timestamps_create). +* [Enabling and disabling](02_enabling_disabling) provides an example of enabling column-level conflict resolution and explains how to list tables with column-level conflict resolution enabled. * [Timestamps](03_timestamps) explicates the difference between using `column_modify_timestamp` and `column_commit_timestamp` and shows how the timestamps associated with column-level conflict resolution can be selected and inspected. From 256a366a16d06b85e9f696e8625eb8df93fb5cc3 Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Fri, 2 Aug 2024 14:59:00 -0400 Subject: [PATCH 27/31] Some change in reference index.mdx. --- product_docs/docs/pgd/5/reference/index.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/product_docs/docs/pgd/5/reference/index.mdx b/product_docs/docs/pgd/5/reference/index.mdx index 082ad28f414..3bb316558e7 100644 --- a/product_docs/docs/pgd/5/reference/index.mdx +++ b/product_docs/docs/pgd/5/reference/index.mdx @@ -190,7 +190,7 @@ The reference section is a definitive listing of all functions, views, and comma * [`bdr.min_worker_backoff_delay`](pgd-settings#bdrmin_worker_backoff_delay) ### [CRDTs](pgd-settings#crdts) * [`bdr.crdt_raw_value`](pgd-settings#bdrcrdt_raw_value) -### [Commit Scope](pgd-settings#commit-scope) +### [Commit scope](pgd-settings#commit-scope) * [`bdr.commit_scope`](pgd-settings#bdrcommit_scope) ### [Commit At Most Once](pgd-settings#commit-at-most-once) * [`bdr.camo_local_mode_delay`](pgd-settings#bdrcamo_local_mode_delay) @@ -213,7 +213,7 @@ The reference section is a definitive listing of all functions, views, and comma * [`bdr.track_subscription_apply`](pgd-settings#bdrtrack_subscription_apply) * [`bdr.track_relation_apply`](pgd-settings#bdrtrack_relation_apply) * [`bdr.track_apply_lock_timing`](pgd-settings#bdrtrack_apply_lock_timing) -### [Decoding Worker](pgd-settings#decoding-worker) +### [Decoding worker](pgd-settings#decoding-worker) * [`bdr.enable_wal_decoder`](pgd-settings#bdrenable_wal_decoder) * [`bdr.receive_lcr`](pgd-settings#bdrreceive_lcr) * [`bdr.lcr_cleanup_interval`](pgd-settings#bdrlcr_cleanup_interval) From 92e36b91863ed8e4763f6c2f5360f99a494c24fa Mon Sep 17 00:00:00 2001 From: Josh Earlenbaugh Date: Fri, 2 Aug 2024 15:14:18 -0400 Subject: [PATCH 28/31] Add note about PGD proxy as a way to avoid conflicts. --- product_docs/docs/pgd/5/consistency/index.mdx | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/product_docs/docs/pgd/5/consistency/index.mdx b/product_docs/docs/pgd/5/consistency/index.mdx index 901a7b4fbbf..45c1bea1a23 100644 --- a/product_docs/docs/pgd/5/consistency/index.mdx +++ b/product_docs/docs/pgd/5/consistency/index.mdx @@ -18,5 +18,4 @@ By default, conflicts are resolved at the row level. When changes from two nodes Column-level conflict detection and resolution is available with PGD, described in [CLCD](column-level-conflicts). -If you want to avoid conflicts, you can use [Group Commit](/pgd/latest/durability/group-commit) with -[Eager conflict resolution](eager) or conflict-free data types (CRDTs), described in [CRDT](crdt). +If you want to avoid conflicts, you can use [Group Commit](/pgd/latest/durability/group-commit) with [Eager conflict resolution](eager) or conflict-free data types (CRDTs), described in [CRDT](crdt). You can also use PGD Proxy and route all writes to one write-leader, eliminating the chance for inter-nodal conflicts. From 9a4ce27c09481707e0a2c4fb1643210a35b96aea Mon Sep 17 00:00:00 2001 From: Dj Walker-Morgan <126472455+djw-m@users.noreply.github.com> Date: Tue, 6 Aug 2024 16:21:55 +0100 Subject: [PATCH 29/31] Removed redirect 01_overview_clcd.mdx --- .../5/consistency/column-level-conflicts/01_overview_clcd.mdx | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx index 9d032f3df8f..0c810550011 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/01_overview_clcd.mdx @@ -1,8 +1,6 @@ --- navTitle: Overview title: Overview -redirects: - - /pgd/latest/bdr/column-level-conflicts/ --- By default, conflicts are resolved at row level. When changes from two nodes conflict, either the local or remote tuple is selected and the other is discarded. For example, commit timestamps for the two conflicting changes might be compared and the newer one kept. This approach ensures that all nodes converge to the same result and establishes commit-order-like semantics on the whole cluster. @@ -64,4 +62,4 @@ Each of those updates is valid when executed on the initial row and so passes on ## Handling column-level conflicts using CRDT data types -By default, column-level conflict resolution picks the value with a higher timestamp and discards the other one. You can, however, reconcile the conflict in different, more elaborate ways. For example, you can use [CRDT types](../crdt) that allow merging the conflicting values without discarding any information. \ No newline at end of file +By default, column-level conflict resolution picks the value with a higher timestamp and discards the other one. You can, however, reconcile the conflict in different, more elaborate ways. For example, you can use [CRDT types](../crdt) that allow merging the conflicting values without discarding any information. From fff43727b187eb6885e26fcfa05d1aec8e3cb53e Mon Sep 17 00:00:00 2001 From: Dj Walker-Morgan <126472455+djw-m@users.noreply.github.com> Date: Tue, 6 Aug 2024 16:22:27 +0100 Subject: [PATCH 30/31] Remove redirect 02_enabling_disabling.mdx --- .../column-level-conflicts/02_enabling_disabling.mdx | 2 -- 1 file changed, 2 deletions(-) diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx index 131715dc1ad..a145d1d67a7 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/02_enabling_disabling.mdx @@ -2,8 +2,6 @@ navTitle: Enabling and disabling title: Enabling and disabling column-level conflict resolution deepToC: true -redirects: - - /pgd/latest/bdr/column-level-conflicts/ --- !!! Note Permissions required From bbadc59c235a4d1ed33bfaa0759e8f00b470e782 Mon Sep 17 00:00:00 2001 From: Dj Walker-Morgan <126472455+djw-m@users.noreply.github.com> Date: Tue, 6 Aug 2024 16:23:27 +0100 Subject: [PATCH 31/31] Remove redirect 03_timestamps.mdx --- .../5/consistency/column-level-conflicts/03_timestamps.mdx | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx b/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx index 6b7383872f7..350e2b9cd54 100644 --- a/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx +++ b/product_docs/docs/pgd/5/consistency/column-level-conflicts/03_timestamps.mdx @@ -1,8 +1,6 @@ --- navTitle: Timestamps title: Timestamps in column-level conflict resolution -redirects: - - /pgd/latest/bdr/column-level-conflicts/ --- As previously mentioned, column-level conflict resolution depends on a timestamp column being included in the table. @@ -96,4 +94,4 @@ db=# select bdr.column_timestamps_resolve(cts, xmin)::jsonb from test_table; ----------------------------------------------------------------------------------------------------------------------- {"map": {"2": "2018-09-23T19:29:55.581823+02:00"}, "source": "commit", "default": "2018-09-23T19:29:55.581823+02:00"} (1 row) -``` \ No newline at end of file +```