Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry Pick Fix orphaned temp namespace catalog entry left on coordinator #882

Merged
merged 17 commits into from
Jan 22, 2025

Conversation

yjhjstz
Copy link
Member

@yjhjstz yjhjstz commented Jan 21, 2025

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


dreamedcheng and others added 16 commits January 20, 2025 15:41
Note that, commit e88ceb8 fixed orphaned temp table left on
coordinator, but it did't handle orphaned namespace catalog
entry left on coordinator. This commit fix that.
Backported from Postgres REL_13_BETA1
postgres/postgres@8961355

Original commit message:

Buildfarm member chipmunk has failed twice due to taking >30s, and
twenty-four runs of other members have used >5s.  The test is new in
v13, so no back-patch.
**Issue:**
Currently `gpexpand` errors out whenever it is run using a user-created input file (not created using the `gpexpand` interview process) on a cluster that has `custom tablespaces` created with the following error -
```
$ cat gpexpand_inputfile_20230914_201220
jnihal3MD6M.vmware.com|jnihal3MD6M.vmware.com|7005|/tmp/demoDataDir3|5|3|p

$ gpexpand -i gpexpand_inputfile_20230914_201220
20230914:20:13:04:066896 gpexpand:jnihal3MD6M:jnihal-[ERROR]:-gpexpand failed: [Errno 2] No such file or directory: 'gpexpand_inputfile_20230914_201220.ts'
```

**RCA:**
This is happening due to the commit 9b70ba8698f656c40ee62e5519314f7db1e4655e. This commit introduced a change, where it requires `gpexpand` to have a separate tablespace input configuration file (`<input_file>.ts`) whenever there are `custom tablespaces` in the database. However, this file only gets created whenever the user uses the `gpexpand` interview process to create the input file.

In cases where the user manually creates the input file, the tablespace file is missing which causes the above error.

**Fix:**
Add a check in the `read_tablespace_file()` function to assert if the file is present or not. In cases where the file is not present, create the file automatically and exit from the process to give users a chance to review them (if they want to change the `tablespace` location) and prompt them to re-run `gpexpand`.

The call to the `read_tablespace_file()` is also moved before we start the expansion process. This is because we want to exit from the process before we start the expansion so that the user does not have to `rollback` when they re-run `gpexpand`.

```
$ gpexpand -i gpexpand_inputfile_20230914_201220
20230914:20:24:00:014186 gpexpand:jnihal3MD6M:jnihal-[WARNING]:-Could not locate tablespace input configuration file 'gpexpand_inputfile_20230914_201220.ts'. A new tablespace input configuration file is written to 'gpexpand_inputfile_20230914_201220.ts'. Please review the file and re-run with: gpexpand -i gpexpand_inputfile_20230914_201220
20230914:20:24:00:014186 gpexpand:jnihal3MD6M:jnihal-[INFO]:-Exiting...

$ gpexpand -i gpexpand_inputfile_20230914_201220   --> re-run with the same input file
```
…Y is not set. (#16433)

* Fix: utilities do not honor -d flag when COORDINATOR_DATA_DIRECTORY is not set.

Issue: Following are the gpMgmt utilities that do not honor the -d flag, when
COORDINATOR_DATA_DIRECTORY is not set.
1. gpstart
2. gpstop
3. gpstatus
4. gprecoverseg
5. gpaddmirror

RCA: to get the coordinator data directory gp.getcoordinator_datadir()
function is called from the above-listed utilities. the function does not
have any provision to return the coordinator data directory which is
provided with the -d flag. currently, it looks for COORDINATOR_DATA_DIRECTORY
and MASTER_DATA_DIRECTORY env variable.

    also in some of the utilities we were creating lock files before
parsing the provided options which looks like the design flow that was
causing the utilities to crash when looking for coordinator data
directory.

Fix: Added a global flag which holds the data directory provided with -d
option. so when we run the utility and do parsing it sets the flag with
the provided datadirectory and the same will be returned when we call
gp.gp.getcoordinator_datadir().

Test:
Added behave test cases to use the provided data directory if COORDINATOR_DATA_DIRECTORY
is not set.
Added behave test case to check if the provided coordinator data directory
is preferred over the already set coordinator_data_dir env variable.
it is tested by setting a wrong COORDINATOR_DATA_DIRECTORY env variable
and when we run the utility with the correct data directory using the -d option then
the utility should execute successfully.
This should have be done with #16428, but we need to disable autovacuum when
running the gp_check_files regress test. Otherwise we might see errors like:

```
@@ -53,12 +53,8 @@
 -- check orphaned files, note that this forces a checkpoint internally.
 set client_min_messages = ERROR;
 select gp_segment_id, filename from run_orphaned_files_view();
- gp_segment_id | filename
----------------+----------
-             1 | 987654
-             1 | 987654.3
-(2 rows)
-
+ERROR:  failed to retrieve orphaned files after 10 minutes of retries.
+CONTEXT:  PL/pgSQL function run_orphaned_files_view() line 19 at RAISE
 reset client_min_messages;
```

In the log we have:
```
2023-09-20 15:33:00.766420 UTC,"gpadmin","regression",p148081,th-589358976,"[local]",,2023-09-20 15:31:39 UTC,0,con19,cmd65,seg-1,,dx38585,,sx1,"LOG","00000","attempt failed 17 with error: There is a client session running on one or more segment. Aborting...",,,,,"PL/pgSQL function run_orphaned_files_view() line 11 at RAISE","select gp_segment_id, filename from run_orphaned_files_view();",0,,"pl_exec.c",3857,

```

It is possible that some background jobs have created some backends that we
think we should avoid when taking the gp_check_orphaned_files view. As we have
decided to make the view conservative (disallowing any backends that could
cause false positive of the view results), fixing the test is what we need.

In the test we have a safeguard which is to loop 10 minutes and take the view
repeatedly (function run_orphaned_files_view()). But it didn't solve the issue
because it saw only one snapshot of pg_stat_activity in the entire execution of
the function. Now explicitly call pg_stat_clear_snapshot() to solve that issue.

Co-authored-by: Ashwin Agrawal [email protected]
The comments for create_split_update_path:
```
GPDB_96_MERGE_FIXME: the below comment is obsolete. Nowadays, SplitUpdate
computes the new row's hash, and the corresponding. target segment. The
old segment comes from the gp_segment_id junk column. But ORCA still
does it the old way!

Third, to support deletion, and hash delete operation to correct segment,
we need to get attributes of OLD tuple. The old attributes must therefore
be present in the subplan's target list. That is handled earlier in the
planner, in expand_targetlist().
```
For planner we use Explicit Redistribute Motion instead of Redistribute Motion above the split node,
So old attributes is useless to compute target segment id for Redistribute Motion.
remove above comments.
… exceeded." (#16388)

When user try to select a big number of rows via PSQL, they may encounter the following error:
ERROR: "Cannot add cell to table content: total cell count of XXX exceeded.

The root cause is the overflow of int type in printTableAddCell().
Fix it in this commit.

(And this issue also exists in upstream, I have reported it:
https://www.postgresql.org/message-id/flat/TYBP286MB0351B057B101C90D7C1239E6B4E2A%40TYBP286MB0351.JPNP286.PROD.OUTLOOK.COM)
Add the duration time into "canceling statement due to user request" to help the user
debug when canceling the query.

The log message will be

2023-10-10 03:32:55.325528 UTC,"smart","postgres",p463714,th1746153216,"[local]",,2023-10-10 03:32:01 UTC,
0,con32,cmd9,seg-1,,,,sx1,"ERROR","57014","canceling statement due to user request, duration:4256.617",,,,,,
"select * from t, t as t1, t as t2;",0,,"postgres.c",4053,

Not change the log schema.
…5673)

The coordinator and all segments are required to have same set of collations.
This function was tuned to support MPP structure via DispatchCollationCreate,
but only for importing collations from libc. Once ICU enabled, it would fail.

This commit updated DispatchCollationCreate to support dispatching ICU
collations. New argument `provider` is added while unused `encoding` removed.
When session exits within a transaction block, temp table created
in this session will be left. This is because ResetAllGangs will
reset myTempNamespace, normal temp table catalog clean up won't be
called. More details could ref
https://github.com/greenplum-db/gpdb/issues/16506

Noted that, when a session is going to exit on coordinator, there
is no need to drop temp tables by calling GpDropTempTables, callback
function will do that at the end.
fixme in create_ctescan_path is as bellow:
```
// GPDB_96_MERGE_FIXME: Why do we set pathkeys in GPDB, but not in Postgres?
// pathnode->pathkeys = NIL;    /* XXX for now, result is always unordered */
pathnode->pathkeys = pathkeys;
```
By now we can not find the commit info of why we use pathkeys in GPDB, but not in Postgres.
The codes "pathnode->pathkeys = pathkeys;" are already introduced after commit: 6b0e52b. 
And I can not find much more infos in historical repos too.

Sometimes we can reduce extra sort path when the pathkeys of cte node and the pathkeys of subpath 
node are the same.
So we should keep these codes. Just remove GPDB_96_MERGE_FIXME.
The test was flaky with a diff like:

```
@@ -397,9 +397,9 @@
 (8 rows)

 select list_tablespace_dbid_dirs(:expected_number_of_tablespaces, :'tablespace_location');
-                                 list_tablespace_dbid_dirs
+                                                 list_tablespace_dbid_dirs
 ------------
- {Success,"expected 1 tablespaces","found 1 tablespaces",/tmp/my_tablespace_for_testing/2}
+ {Failed,"expected 1 tablespaces","found 2 tablespaces",/tmp/my_tablespace_for_testing/2,/tmp/my_tablespace_for_testing/5}
 (1 row)
```

The reason is that there's a chance the mirror has already replayed the WAL to
create tablespace, so we can have two orphaned tablespace directories instead of
one. Now make it panic even before we write the WAL to create tablespace to make
it not flaky.
Add some help information for the gpfdist flag 'ssl_verify_peer'.
Once a logical slot has acquired a catalog_xmin, it doesn't let go of
it, even when invalidated by exceeding the max_slot_wal_keep_size, which
means that dead catalog tuples are not removed by vacuum anymore since
the point is invalidated, until the slot is dropped.  This could be
catastrophic if catalog churn is high.

Change the computation of Xmin to ignore invalidated slots,
to prevent dead rows from accumulating.

Backpatch to 13, where slot invalidation appeared.

Author: Sirisha Chamarthi <[email protected]>
Reviewed-by: Ashutosh Bapat <[email protected]>
Discussion: https://postgr.es/m/CAKrAKeUEDeqquN9vwzNeG-CN8wuVsfRYbeOUV9qKO_RHok=j+g@mail.gmail.com
(cherry picked from commit 36eeb37cd611c0a0bfb5743d9ddbef8f04fc87f3)
Although WAL replication is used as the HA mechanism by all segments
in a Greenplum cluster, there are some differences in how it is used
by coordinator/standby and by primary/mirror segments.  One such
difference is the use of replication slots.  Primary/mirror WAL
replication makes use of replication slots to ensure WAL that a
primary retains WAL that may still be needed by its mirror.

Replication slots are not used for coordinator/standby.  To ensure
that coordinator retains WAL that is not yet received by the standby,
possibly due to replication lag, a special mechanism is used by
coordinator to track the flush location of standby.

This patch confinces the use of this special mechanism to coordinator,
the mechanism is no longer used by primary segments.  Ideally,
coordinator/standby may also make use of replication slots.  The GUC
max_slot_wal_keep_size can be set to the lowest value of
(checkpoint_segments * XLOG_BLCKSZ) on coordinator to ensure that WAL
doesn't fill up disk space when the standby goes down.  That could
done in a followup patch.

Reviewed by: Ashwin Agrawal <[email protected]>
Reviewed by: Paul Guo <[email protected]>

(cherry picked from commit 0f2d65f497245eb00b5fc544cf609e79ddd75226)
The current WAL record pointer passed by the test to KeepLogSeg was
older that _logSegNo.  This cannot happen, so fix it by increasing it
beyond _logSegNo.  The test needs to be refactored heavily to make use
of new WAL format, but let's do it as a follow up.

Reviewed by: Ashwin Agrawal <[email protected]>
Reviewed by: Paul Guo <[email protected]>

(cherry picked from commit e4ec851b151267fe3209af6804e438611836a875)
@yjhjstz yjhjstz added the cherry-pick cherry-pick upstream commts label Jan 21, 2025
…ch) (#14978)

This pr can fix the bug reported in [issue 14644](https://github.com/greenplum-db/gpdb/issues/14644)

**Bug Detail:**
After analyzing a table with a dropped subclass, the value on the coordinator may differ from the segments. See the SQL result with bug below:
```
gpadmin=# CREATE TYPE test_type2 AS (a int, b text);
CREATE TYPE
gpadmin=# CREATE TABLE test_tbl2 OF test_type2;
CREATE TABLE
gpadmin=# CREATE TABLE test_tbl2_subclass () INHERITS (test_tbl2);
CREATE TABLE
gpadmin=# DROP TABLE test_tbl2_subclass;
DROP TABLE
gpadmin=# analyze;
ANALYZE
gpadmin=# select gp_segment_id, relhassubclass from pg_class where relname = 'test_tbl2';
gp_segment_id | relhassubclass
---------------+----------------
            -1 | t
(1 row)

gpadmin=# select gp_segment_id, relhassubclass from gp_dist_random('pg_class') where relname = 'test_tbl2';
 gp_segment_id | relhassubclass
---------------+----------------
             2 | f
             1 | f
             0 | f
(3 rows)
```

**Correct Behavior:**
the value of `relhassubclass `on the coordinator should be the same as segments.
```
gpadmin=# DROP TABLE test_tbl2_subclass;
DROP TABLE
gpadmin=# analyze;
ANALYZE
gpadmin=# select gp_segment_id, relhassubclass from pg_class where relname = 'test_tbl2';
gp_segment_id | relhassubclass
---------------+----------------
            -1 | f
(1 row)

gpadmin=# select gp_segment_id, relhassubclass from gp_dist_random('pg_class') where relname = 'test_tbl2';
 gp_segment_id | relhassubclass
---------------+----------------
             2 | f
             1 | f
             0 | f
(3 rows)
```
Signed-off-by: Yongtao Huang <[email protected]>
@yjhjstz yjhjstz marked this pull request as ready for review January 21, 2025 08:22
@yjhjstz yjhjstz merged commit a96a9df into apache:main Jan 22, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick cherry-pick upstream commts
Projects
None yet
Development

Successfully merging this pull request may close these issues.