[BACKPORT 2.18.5][#20329] DocDB: Early abort ReadCommitted transactio…

…n that will eventually get restarted Summary: Original commit: f2b9e28 / D31126 The query layer retry logic, in postgres.c, for `kConflict` errors for RC transactions is as follows: ``` if (retry_possible) { // which is determined based on certain conditions ... if (IsInTransactionBlock(true /* isTopLevel */)) { // We seem to hit this in RC when the a statement within - begin; ... commit; faces a kConflict RollbackAndReleaseCurrentSubTransaction(); yb_maybe_sleep_on_txn_conflict(); } else { // We seem to hit this block for single statement transactions (both fast path and distributed transactions). // In this case, no internal sub-txn has been created for RC transations, so the sub-txn id is still 1. // Since there is no internal sub-txn, we can't rollback the statement and have to retry the whole transaction. yb_maybe_sleep_on_txn_conflict(); } ... } // next rpc ``` In the common case of executing a statement inside a transaction block in RC, `IsInTransactionBlock()` evaluates to true and the query layer rolls back the statement before restarting the statement. However, in case where a single statement read committed transaction faces a `kConflict` in `Fail-on-Conflict` concurrency control, the query layer restarts the whole transaction after sleeping for some time (i.e., the `else` case above). The next rpc restarts the transaction, which is when the abort of the old transaction is done in `PgClientSession` (query layer). During the sleep, we would treat the (old) transaction as active and needlessly `kConflict` other transactions for which this old transaction is a blocker. In this case, since we anyways know that the (old) transaction would be aborted in the next rpc, we could early abort it as well, improving the throughout of the system. The solution is to abort RC transactions in `PgClientSession` if a `kConflict`/`kReadRestart` is received and we know that this was a single statement transaction (i.e., sub-txn id is still `1`). Jira: DB-9315 Test Plan: Jenkins Unit test that fails without the changes: ./yb_build.sh --cxx-test='TEST_F_EX(PgOnConflictTest, EarlyAbortSingleStatementReadCommittedTxn, PgFailOnConflictTest) {' -n 4 --tp 1 Manual test: Create a cluster using: ``` ./bin/yb-ctl create --rf=3 --data_dir ~/yugabyte-data --tserver_flags 'yb_enable_read_committed_isolation=true,enable_wait_queues=false' ``` Setup a table using: ``` $./bin/ysqlsh $create table test (k int primary key, v1 int, v2 int); $create index idx on test (v1); $create index idx2 on test (v2); $insert into test values (1, 1, 1); ``` Create a file with contents as follows ``` $cat update.sql update test set v1=2, v2=1 where k=1; ``` Launch 2 jobs executing the aboe file ``` ./build/latest/postgres/bin/ysql_bench --jobs=2 --client=2 --file=update.sql --progress=10 --time=60 -d fails ``` Without the changes, the throughout drops to 0 quickly. With the changes in the diff, the throughput maintains. Reviewers: rsami, pjain, mtakahara, bkolagani Reviewed By: pjain Subscribers: ybase, yql, smishra Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31374
yugabyte · Dec 28, 2023 · b24a9f5 · b24a9f5
1 parent 4bf81c0
commit b24a9f5
Show file tree

Hide file tree

Showing 2 changed files with 82 additions and 12 deletions.
diff --git a/src/yb/client/transaction.cc b/src/yb/client/transaction.cc
@@ -166,6 +166,29 @@ Result<ChildTransactionData> ChildTransactionData::FromPB(const ChildTransaction
   return result;
 }
 
+bool CanAbortTransaction(const Status& status,
+                         const TransactionMetadata& txn_metadata,
+                         const YBSubTransaction& subtransaction) {
+  // We don't abort the transaction in the following scenarios:
+  // 1. When we face a kSkipLocking error, so as to make further progress.
+  // 2. When running a transaction with READ COMMITTED isolation where the current subtransaction
+  //    id is not at the default value, and we face a kConflict/kReadRestart error. This is because
+  //    YB PG backend retries kConflict and kReadRestart errors for READ COMMITTED isolation by
+  //    restarting statements instead of the whole transaction if the statment is in a transactional
+  //    block. Else it restarts the whole transaction, in which case we can abort the current one.
+  //
+  //    See 'IsInTransactionBlock' function in src/postgres/src/backend/tcop/postgres.c for details.
+  const TransactionError txn_err(status);
+  if (txn_err.value() == TransactionErrorCode::kSkipLocking) {
+    return false;
+  }
+  if (txn_metadata.isolation == IsolationLevel::READ_COMMITTED && subtransaction.active()) {
+    return txn_err.value() != TransactionErrorCode::kReadRestartRequired &&
+           txn_err.value() != TransactionErrorCode::kConflict;
+  }
+  return true;
+}
+
 YB_DEFINE_ENUM(MetadataState, (kMissing)(kMaybePresent)(kPresent));
 
 std::string YBSubTransaction::ToString() const {
@@ -498,24 +521,13 @@ class YBTransaction::Impl final : public internal::TxnBatcherIf {
           }
         }
       } else {
-        const TransactionError txn_err(status);
-        // We don't abort the txn in case of a kSkipLocking error to make further progress.
-        // READ COMMITTED isolation retries errors of kConflict and kReadRestart by restarting
-        // statements instead of the whole txn and hence should avoid aborting the txn in this case
-        // too.
-        bool avoid_abort =
-            (txn_err.value() == TransactionErrorCode::kSkipLocking) ||
-            (metadata_.isolation == IsolationLevel::READ_COMMITTED &&
-              (txn_err.value() == TransactionErrorCode::kReadRestartRequired ||
-                txn_err.value() == TransactionErrorCode::kConflict));
-        if (!avoid_abort) {
+        if (CanAbortTransaction(status, metadata_, subtransaction_)) {
           auto state = state_.load(std::memory_order_acquire);
           VLOG_WITH_PREFIX(4) << "Abort desired, state: " << AsString(state);
           if (state == TransactionState::kRunning) {
             abort = true;
             // State will be changed to aborted in SetError
           }
-
           SetErrorUnlocked(status, "Flush");
         }
       }

diff --git a/src/yb/yql/pgwrapper/pg_on_conflict-test.cc b/src/yb/yql/pgwrapper/pg_on_conflict-test.cc
@@ -32,6 +32,17 @@ class PgOnConflictTest : public LibPqTestBase {
   void TestOnConflict(bool kill_master, const MonoDelta& duration);
 };
 
+class PgFailOnConflictTest : public PgOnConflictTest {
+ protected:
+  void UpdateMiniClusterOptions(ExternalMiniClusterOptions* opts) override {
+    PgOnConflictTest::UpdateMiniClusterOptions(opts);
+    // This test depends on fail-on-conflict concurrency control to perform its validation.
+    // TODO(wait-queues): https://github.com/yugabyte/yugabyte-db/issues/17871
+    opts->extra_tserver_flags.push_back("--enable_wait_queues=false");
+    opts->extra_tserver_flags.push_back("--yb_enable_read_committed_isolation=true");
+  }
+};
+
 namespace {
 
 struct OnConflictKey {
@@ -425,5 +436,52 @@ TEST_F(PgOnConflictTest, YB_DISABLE_TEST_IN_TSAN(ValidSessionAfterTxnCommitConfl
   ASSERT_EQ(value, 1);
 }
 
+// When a single statement Read Committed transaction executed outside of a begin block faces a
+// kConflict, PG backend could sleep for a while delaying the next rpc which restarts the txn.
+// PgClientSession early aborts such transactions before returning kConflict to the PG backend
+// so as to progress other transactions waiting on a set of locks that could have been acquired
+// by the former transaction. The below test asserts that such transactions are early aborted at
+// PgClientSession.
+//
+// Note: The test is intended to be run with Fail-On-Conflict conflict management policy because
+// we only sleep between query layer retries in Fail-on-Conflict mode.
+TEST_F_EX(PgOnConflictTest, EarlyAbortSingleStatementReadCommittedTxn, PgFailOnConflictTest) {
+  constexpr int kClients = 3;
+  constexpr int kIters = 100;
+  constexpr int kStatementTimeoutMs = 10000 * kTimeMultiplier;
+
+  auto conn = ASSERT_RESULT(Connect());
+  ASSERT_OK(conn.Execute("CREATE TABLE foo(k INT PRIMARY KEY, v INT)"));
+  ASSERT_OK(conn.Execute("INSERT INTO foo SELECT generate_series(1, 10), 0"));
+
+  ASSERT_OK(conn.StartTransaction(IsolationLevel::READ_COMMITTED));
+  ASSERT_OK(conn.Fetch("SELECT * FROM foo WHERE k=1 FOR UPDATE"));
+
+  TestThreadHolder thread_holder;
+  for (int i = 0; i < kClients; i++) {
+    thread_holder.AddThreadFunctor([&]{
+      auto conn = ASSERT_RESULT(Connect());
+      ASSERT_OK(conn.ExecuteFormat("SET statement_timeout=$0", kStatementTimeoutMs * 2));
+      while (!thread_holder.stop_flag()) {
+        // RPCs to different tablets would be made in parallel, so the transaction would obtain
+        // locks at a few tablets and kConflict at the rest. PG backend would retry the transaction
+        // for a couple of times with sleeps in between, before the statement timesout.
+        ASSERT_NOK(conn.Execute("UPDATE foo SET v=1 WHERE k>=1"));
+      }
+    });
+  }
+
+  thread_holder.AddThreadFunctor([&]{
+    auto conn = ASSERT_RESULT(Connect());
+    ASSERT_OK(conn.ExecuteFormat("SET statement_timeout=$0", kStatementTimeoutMs / 2));
+    for (int i = 0; i < kIters; i++) {
+      // Since PgClientSession would early abort the above conflicting transactions before the sleep
+      // amidst retries in the backend, this statement should get enough window for the updates.
+      ASSERT_OK(conn.Execute("UPDATE foo SET v=1 WHERE k>=2"));
+    }
+  });
+  thread_holder.Wait(30s * kTimeMultiplier);
+}
+
 } // namespace pgwrapper
 } // namespace yb