Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[BACKPORT 2.18.5][#20329] DocDB: Early abort ReadCommitted transactio…
…n that will eventually get restarted Summary: Original commit: f2b9e28 / D31126 The query layer retry logic, in postgres.c, for `kConflict` errors for RC transactions is as follows: ``` if (retry_possible) { // which is determined based on certain conditions ... if (IsInTransactionBlock(true /* isTopLevel */)) { // We seem to hit this in RC when the a statement within - begin; ... commit; faces a kConflict RollbackAndReleaseCurrentSubTransaction(); yb_maybe_sleep_on_txn_conflict(); } else { // We seem to hit this block for single statement transactions (both fast path and distributed transactions). // In this case, no internal sub-txn has been created for RC transations, so the sub-txn id is still 1. // Since there is no internal sub-txn, we can't rollback the statement and have to retry the whole transaction. yb_maybe_sleep_on_txn_conflict(); } ... } // next rpc ``` In the common case of executing a statement inside a transaction block in RC, `IsInTransactionBlock()` evaluates to true and the query layer rolls back the statement before restarting the statement. However, in case where a single statement read committed transaction faces a `kConflict` in `Fail-on-Conflict` concurrency control, the query layer restarts the whole transaction after sleeping for some time (i.e., the `else` case above). The next rpc restarts the transaction, which is when the abort of the old transaction is done in `PgClientSession` (query layer). During the sleep, we would treat the (old) transaction as active and needlessly `kConflict` other transactions for which this old transaction is a blocker. In this case, since we anyways know that the (old) transaction would be aborted in the next rpc, we could early abort it as well, improving the throughout of the system. The solution is to abort RC transactions in `PgClientSession` if a `kConflict`/`kReadRestart` is received and we know that this was a single statement transaction (i.e., sub-txn id is still `1`). Jira: DB-9315 Test Plan: Jenkins Unit test that fails without the changes: ./yb_build.sh --cxx-test='TEST_F_EX(PgOnConflictTest, EarlyAbortSingleStatementReadCommittedTxn, PgFailOnConflictTest) {' -n 4 --tp 1 Manual test: Create a cluster using: ``` ./bin/yb-ctl create --rf=3 --data_dir ~/yugabyte-data --tserver_flags 'yb_enable_read_committed_isolation=true,enable_wait_queues=false' ``` Setup a table using: ``` $./bin/ysqlsh $create table test (k int primary key, v1 int, v2 int); $create index idx on test (v1); $create index idx2 on test (v2); $insert into test values (1, 1, 1); ``` Create a file with contents as follows ``` $cat update.sql update test set v1=2, v2=1 where k=1; ``` Launch 2 jobs executing the aboe file ``` ./build/latest/postgres/bin/ysql_bench --jobs=2 --client=2 --file=update.sql --progress=10 --time=60 -d fails ``` Without the changes, the throughout drops to 0 quickly. With the changes in the diff, the throughput maintains. Reviewers: rsami, pjain, mtakahara, bkolagani Reviewed By: pjain Subscribers: ybase, yql, smishra Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31374
- Loading branch information