Proxy release 2025-02-06 #10691

vipvap · 2025-02-06T06:02:17Z

Proxy release 2025-02-06

Please merge this Pull Request using 'Create a merge commit' button

## Problem The current global `pageserver_layers_visited_per_vectored_read_global` metric does not appear to accurately measure read amplification. It divides the layer count by the number of reads in a batch, but this means that e.g. 10 reads with 100 L0 layers will only measure a read amp of 10 per read, while the actual read amp was 100. While the cost of layer visits are amortized across the batch, and some layers may not intersect with a given key, each visited layer contributes directly to the observed latency for every read in the batch, which is what we care about. Touches neondatabase/cloud#23283. Extracted from #10566. ## Summary of changes * Count the number of layers visited towards each read in the batch, instead of the average across the batch. * Rename `pageserver_layers_visited_per_vectored_read_global` to `pageserver_layers_per_read_global`. * Reduce the read amp log warning threshold down from 512 to 100.

## Problem We suspect that Postgres checkpoints will limit the number of page deltas necessary to reconstruct a page, but don't know for certain. Touches neondatabase/cloud#23283. ## Summary of changes Add `pageserver_deltas_per_read_global` metric. This pairs with `pageserver_layers_per_read_global` from #10573.

We don't use statically linked OpenSSL anymore (#10302), it's ok to switch to Neon's pgbench for pgvector benchmarks

## Problem We don't have per-timeline observability for read amplification. Touches neondatabase/cloud#23283. ## Summary of changes Add a per-timeline `pageserver_layers_per_read` histogram. NB: per-timeline histograms are expensive, but probably worth it in this case.

## Problem The client code for `tenant-set-preferred-az` declared response type `()`, so printed a spurious error on each use: ``` Error: receive body: error decoding response body: invalid type: map, expected unit at line 1 column 0 ``` The requests were successful anyway. ## Summary of changes - Declare the proper return type, so that the command succeeds quietly.

Related to #10308, we might have legitimate changes in file size or generation. Those changes should not cause warn log lines. In order to detect changes of the generation number while the file size stayed the same, load the metadata that we store on disk on loading of the timeline. Still do a comparison with the on-disk layer sizes to find any discrepancies that might occur due to race conditions (new metadata file gets written but layer file has not been updated yet, and PS shuts down). However, as it's possible to hit it in a race conditon, downgrade it to a warning. Also fix a mistake in #10529: we want to compare the old with the new metadata, not the old metadata with itself.

## Problem In some cases, we were returning a very shallow error like `error sending request for url (XXX)`, which made it very hard to figure out the actual error. ## Summary of changes Use `{:?}` in a few places, and remove it from places where we were printing a string anyway.

## Problem close #10482 ## Summary of changes Add an extra lock on the read path to protect against races. The read path has an implication that only certain kind of compactions can be performed. Garbage keys must first have an image layer covering the range, and then being gc-ed -- they cannot be done in one operation. An alternative to fix this is to move the layers read guard to be acquired at the beginning of `get_vectored_reconstruct_data_timeline`, but that was intentionally optimized out and I don't want to regress. The race is not limited to image layers. Gc-compaction will consolidate deltas automatically and produce a flat delta layer (i.e., when we have retain_lsns below the gc-horizon). The same race would also cause behaviors like getting an un-replayable key history as in #10049. Signed-off-by: Alex Chi Z <[email protected]>

…10569) It took me ages to figure out why it was failing on my laptop. What I saw was that when the test makes the 'import_pgdata' in the pageserver, the pageserver actually performs a regular 'bootstrap' timeline creation by running initdb, with no importing. It boiled down to the json request that the test uses: ``` { "new_timeline_id": str(timeline_id), "import_pgdata": { "idempotency_key": str(idempotency), "location": {"LocalFs": {"path": str(importbucket.absolute())}}, }, }, ``` and how serde deserializes into rust structs. The 'LocalFs' enum variant in `models.rs` is gated on the 'testing' cargo feature. On a non-testing build, that got deserialized into the default Bootstrap enum variant, as a valid TimelineCreateRequestModeImportPgdata variant could not be formed. PS. IMHO we should get rid of the testing feature, compile in all the functionality, and have a runtime flag to disable anything dangeorous. With that, you would've gotten a nice "feature only enabled in testing mode" error in this case, or the test would've simply worked. But that's another story.

…0587) ## Problem for OLAP benchmarks we need specific connstr secrets with different database names for each job step This is a follow-up for #10536 In previous PR we used a common GitHub secret for a shared re-use project that has 4 databases: neondb, tpch, clickbench and userexamples. [Failure example](https://neon-github-public-dev.s3.amazonaws.com/reports/main/13044872855/index.html#suites/54d0af6f403f1d8611e8894c2e07d023/fc029330265e9f6e/): ```log # /tmp/neon/pg_install/v17/bin/psql user=neondb_owner dbname=neondb host=ep-broad-brook-w2luwzzv.us-east-2.aws.neon.build sslmode=require options='-cstatement_timeout=0 ' -c -- $ID$ -- TPC-H/TPC-R Pricing Summary Report Query (Q1) -- Functional Query Definition -- Approved February 1998 ... ERROR: relation "lineitem" does not exist ``` ## Summary of changes We need dedicated GitHub secrets and dedicated connection strings for each of the use cases. ## Test run https://github.com/neondatabase/neon/actions/runs/13053968231

## Problem Upgrade pg_mooncake to v0.1.1 ## Summary of changes https://github.com/Mooncake-Labs/pg_mooncake/blob/main/CHANGELOG.md#011-2025-01-29

## Problem The test asserts that it completes at least 10 full timeline lifecycles, but the noisy CI environment sometimes doesn't meet that goal. Related: #10389 ## Summary of changes - Sleep for longer between pageserver restarts, so that the timeline workers have more chance to make progress - Sleep for shorter between retries from timeline worker, so that they have better chance to get in while a pageserver is up between restarts - Relax the success condition to complete at least 5 iterations instead of 10

…10592) There was a regression of #10280, tracked in [#23583](neondatabase/cloud#23583). I have ideas how to fix the issue, but we are too close to the release cutoff, so revert #10280 for now. We can revert the revert later :).

## Problem We don't test the extensions, shipped with contrib ## Summary of changes The tests are now running

… bootstrap (#10532) ## Problem Timeline bootstrap starts a flush loop, but doesn't reliably shut down the timeline (incl. waiting for flush loop to exit) before destroying UninitializedTimeline, and that destructor tries to clean up local storage. If local storage is still being written to, then this is unsound. Currently the symptom is that we see a "Directory not empty" error log, e.g. https://neon-github-public-dev.s3.amazonaws.com/reports/main/12966756686/index.html#testresult/5523f7d15f46f7f7/retries ## Summary of changes - Move fallible IO part of bootstrap into a function (notably, this is fallible in the case of the tenant being shut down while creation is happening) - When that function returns an error, call shutdown() on the timeline

## Problem In #10438 I had got the function for picking tenants backwards, and it was preferring to move things _away_ from their preferred AZ. ## Summary of changes - Fix condition in `is_attached_outside_preferred_az`

## Problem This API is used in tests and occasionally for support. It cast all errors to 500. That can cause a failure on the log checks: https://neon-github-public-dev.s3.amazonaws.com/reports/main/13056992876/index.html#suites/ad9c266207b45eafe19909d1020dd987/683a7031d877f3db/ ## Summary of changes - Avoid using generic anyhow::Error for layer downloads - Map shutdown cases to 503 in http route

## Problem In #10438 it was pointed out that it would be good to avoid picking tenants in ID order, and also to avoid situations where we might double-select the same tenant. There was an initial swing at this in #10443, where Chi suggested a simpler approach which is done in this PR ## Summary of changes - Split total set of tenants into in and out of home AZ - Consume out of home AZ first, and if necessary shuffle + consume from out of home AZ

The test runs this query: select count(*), sum(data::bigint)::bigint from t to validate the test results between each part of the test. It performs a simple sequential scan and aggregation, but was taking an order of magnitude longer on v17 than on previous Postgres versions, which sometimes caused the test to time out. There were two reasons for that: 1. On v17, the planner estimates the table to have only only one row. In reality it has 305790 rows, and older versions estimated it at 611580, which is not too bad given that the table has not been analyzed so the planner bases that estimate just on the number of pages and the widths of the datatypes. The new estimate of 1 row is much worse, and it leads the planner to disregard parallel plans, whereas on older versions you got a Parallel Seq Scan. I tracked this down to upstream commit 29cf61ade3, "Consider fillfactor when estimating relation size". With that commit, table_block_relation_estimate_size() function calculates that each page accommodates less than 1 row when the fillfactor is taken into account, which rounds down to 0. In reality, the executor will always place at least one row on a page regardless of fillfactor, but the new estimation formula doesn't take that into account. I reported this to pgsql-hackers (https://www.postgresql.org/message-id/2bf9d973-7789-4937-a7ca-0af9fb49c71e%40iki.fi), we don't need to do anything more about it in neon. It's OK to not use parallel scans here; once issue 2. below is addressed, the queries are fast enough without parallelism.. 2. On v17, prefetching was not happening for the sequential scan. That's because starting with v17, buffers are reserved in the shared buffer cache before prefetching is initiated, and we use a tiny shared_buffers=1MB setting in the tests. The prefetching is effectively disabled with such a small shared_buffers setting, to protect the system from completely starving out of buffers. To address that, simply bump up shared_buffers in the test. This patch addresses the second issue, which is enough to fix the problem.

refs #0000 --------- Co-authored-by: Fedor Dikarev <[email protected]>

## Problem Because dashmap 6 switched to hashbrown RawTable API, it required us to use unsafe code in the upgrade: #8107 ## Summary of changes Switch to clashmap, a fork maintained by me which removes much of the unsafe and ultimately switches to HashTable instead of RawTable to remove much of the unsafe requirement on us.

Update `tokio` base crates and their deps. Pin `tokio` to at least 1.41 which stabilized task ID APIs. To dedup `mio` dep the `notify` crate is updated. It's used in `compute_tools`. https://github.com/neondatabase/neon/blob/9f81828429ad6475b4fbb1a814240213b74bec63/compute_tools/src/pg_helpers.rs#L258-L367

…10585) ## Problem test_scrubber_tenant_snapshot restarts pageservers, but log validation fails tests on any non white listed storcon warnings, making the test flaky. ## Summary of changes Allow warns like 2025-01-29T12:37:42.622179Z WARN reconciler{seq=1 tenant_id=2011077aea9b4e8a60e8e8a19407634c shard_id=0004}: Call to node 2 (localhost:15352) management API failed, will retry (attempt 1): receive body: error sending request for url (http://localhost:15352/v1/tenant/2011077aea9b4e8a60e8e8a19407634c-0004/location_config): client error (Connect) ref #10462

…10594) ## Problem If offloading races with normal shutdown, we get a "failed to freeze and flush: cannot flush frozen layers when flush_loop is not running, state is Exited". This is harmless but points to it being quite strange to try and freeze and flush such a timeline. flushing on shutdown for an archived timeline isn't useful. Related: #10389 ## Summary of changes - During Timeline::shutdown, ignore ShutdownMode::FreezeAndFlush if the timeline is archived

We want to keep the AWS SDK up to date as that way we benefit from new developments and improvements. Prior update was in #10056

## Problem This test may not fully detect data corruption during splits, since we don't force-compact the entire keyspace. ## Summary of changes Force-compact all data in `test_sharding_autosplit`.

@myrrc

…rl (#10575) Ref: neondatabase/cloud#23461 ## Problem Just made changes around and see these 2 base layers could be optimised. and after review comment from @myrrc setting up timeouts and retries in `alpine/curl` image ## Summary of changes

## Problem We want to check that `diesel print-schema` doesn't generate any changes (`storage_controller/src/schema.rs`) in comparison with the list of migration. ## Summary of changes - Add `diesel_cli` to `build-tools` image - Add `Check diesel schema` step to `build-neon` job, at this stage we have all required binaries, so don't need to compile anything additionally - Check runs only on x86 release builds to be sure we do it at least once per CI run.

Update to a rebased version of our rust-postgres patches, rebased on [this](sfackler/rust-postgres@98f5a11) commit this time. With #10280 reapplied, this means that the rust-postgres crates will be deduplicated, as the new crate versions are finally compatible with the requirements of diesel-async. Earlier update: #10561 rust-postgres PR: neondatabase/rust-postgres#39

…10655) The full build of all extensions takes a long time. When working locally on parts that don't need extensions, you can iterate more quickly by skipping the unnecessary extensions. This adds a build argument to the dockerfile to specify extensions to build. There are three options: - EXTENSIONS=all (default) - EXTENSIONS=minimal: Build only a few extensions that are listed in shared_preload_libraries in the default neon config. - EXTENSIONS=none: Build no extensions (except for the mandatory 'neon' extension).

## Problem Any compaction should never produce l0 layers. This never happened in my experiments, but would be good to guard it early. ## Summary of changes Disallow gc-compaction to produce l0 layers. Signed-off-by: Alex Chi Z <[email protected]>

## Problem The code is intended to reschedule compaction immediately if there are pending tasks. We set the duration to 0 before if there are pending tasks, but this will go through the `if period == Duration::ZERO {` branch and sleep for another 10 seconds. ## Summary of changes Set duration to 1 so that it doesn't sleep for too long. Signed-off-by: Alex Chi Z <[email protected]>

…0673) # Problem When we see an apparent slow request, one possible cause is that the client is failing to consume responses, but we don't have a clear way to see that. # Solution - Log the socket queue depths on slow/stuck connections, so that we have an indication of whether the compute is keeping up with processing the connection's responses. refs - slack https://neondb.slack.com/archives/C036U0GRMRB/p1738652644396329 - refs neondatabase/cloud#23515 - refs neondatabase/cloud#23486

## Problem is_neon_superuser() fiunction is public in pg14/pg15 but statically defined in publicationcmd.c in pg16/pg17 ## Summary of changes Make this function public for all Postgres version. It is intended to be used not only in publicationcmd.c See neondatabase/postgres#573 neondatabase/postgres#576 --------- Co-authored-by: Konstantin Knizhnik <[email protected]>

## Problem USE_ASSERT _CHECKING is defined as empty entity. but it is checked using #if ## Summary of changes Replace `#if USE_ASSERT _CHECKING` with `#ifdef USE_ASSERT _CHECKING` as done in other places in Postgres Co-authored-by: Konstantin Knizhnik <[email protected]>

github-actions · 2025-02-06T07:02:24Z

7425 tests run: 7073 passed, 0 failed, 352 skipped (full report)

Flaky tests (1)

Postgres 17

test_compute_pageserver_connection_stress: release-x86-64-without-lfc

Code coverage* (full report)

functions: 33.2% (8578 of 25805 functions)
lines: 49.1% (72144 of 147058 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
fedf4f1 at 2025-02-06T07:02:23.288Z :recycle:}

awarus · 2025-02-06T07:34:52Z

Relevant PRs:

alexanderlaw and others added 30 commits January 30, 2025 09:09

Fix submodule reference after #10473 (#10577)

a7a706c

Nightly Benchmarks: use pgbench from artifacts (#10370)

8804d58

We don't use statically linked OpenSSL anymore (#10302), it's ok to switch to Neon's pgbench for pgvector benchmarks

chore(compute): pg_mooncake v0.1.1 (#10578)

8293b25

## Problem Upgrade pg_mooncake to v0.1.1 ## Summary of changes https://github.com/Mooncake-Labs/pg_mooncake/blob/main/CHANGELOG.md#011-2025-01-29

Run tests of the contrib extensions (#10392)

bf6d5e9

## Problem We don't test the extensions, shipped with contrib ## Summary of changes The tests are now running

storcon: fix AZ-driven tenant selection in chaos (#10443)

d18f619

## Problem In #10438 I had got the function for picking tenants backwards, and it was preferring to move things _away_ from their preferred AZ. ## Summary of changes - Fix condition in `is_attached_outside_preferred_az`

[infra/notes] impr: add issue types to issue templates (#10018)

423e239

refs #0000 --------- Co-authored-by: Fedor Dikarev <[email protected]>

Update AWS SDK crates (#10588)

7d5c70c

We want to keep the AWS SDK up to date as that way we benefit from new developments and improvements. Prior update was in #10056

test_runner: force-compact in test_sharding_autosplit (#10605)

afbcebe

## Problem This test may not fully detect data corruption during splits, since we don't force-compact the entire keyspace. ## Summary of changes Force-compact all data in `test_sharding_autosplit`.

hlinnaka and others added 7 commits February 5, 2025 18:07

Proxy release 2025-02-06

fedf4f1

vipvap requested review from a team as code owners February 6, 2025 06:02

vipvap requested review from hlinnaka, lubennikovaav, clipperhouse, NanoBjorn, VladLazar, awarus and petuhovskiy and removed request for a team February 6, 2025 06:02

awarus approved these changes Feb 6, 2025

View reviewed changes

awarus merged commit 3e62458 into release-proxy Feb 6, 2025
99 checks passed

awarus deleted the rc/release-proxy/2025-02-06 branch February 6, 2025 08:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proxy release 2025-02-06 #10691

Proxy release 2025-02-06 #10691

vipvap commented Feb 6, 2025

github-actions bot commented Feb 6, 2025

Postgres 17

awarus commented Feb 6, 2025

Proxy release 2025-02-06 #10691

Proxy release 2025-02-06 #10691

Conversation

vipvap commented Feb 6, 2025