adapter: Switch from `GlobalId` to `CatalogItemId` #30189

ParkMyCar · 2024-10-24T20:58:36Z

This PR switches the Adapter/Catalog from referencing items with a GlobalId, to a CatalogItemId.

Internally the compute and storage layers of Materialize will still reference objects by their GlobalId, but conceptually a GlobalId refers to a "collection of data", while a CatalogItemId refers to the database object. This switch allows us to associate multiple GlobalIds (aka "collections of data") with a single Catalog object. The primary motivator for this is supporting live schema migrations, or ALTER TABLE ... ADD COLUMN ....

When adding a new column to a table we'll create a new GlobalId, this ensures that the schema (aka RelationDesc) associated with a GlobalId never changes.

Design

The durable Catalog migration to add CatalogItemId took place in #30163.

What took a bit of thought is where to place the associated GlobalIds for each object, we have the following requirements:

Table: multiple GlobalIds
Source: single GlobalId (maybe eventually multiple?)
Log: single GlobalId
View: single GlobalId
MaterializedView: single GlobalId
ContinualTask: single GlobalId
Sink: single GlobalId
Index: single GlobalId
Type: no GlobalIds
Func: no GlobalIds
Secret: no GlobalIds
Connection: no GlobalIds

Type, Func, Secret, and Connection are never referenced by the compute and storage layer so they don't need a GlobalId. Meanwhile Tables can be evolved so they need to support multiple GlobalIds. What complicates this a bit is how we use the create_sql for an object as our durable record, while we could come up with some syntax like CREATE TABLE [<name> WITH u1 u2 u3] ... as a way to associate multiple GlobalIds with an object, this didn't feel great.

What I landed on was persisting a single GlobalId for all objects, and then including an extra_versions: BTreeMap<Version, GlobalId> at the durable layer. While not all types need a GlobalId, associating a single GlobalId for each object makes things easier to reason about. And while I would like to "design away invalid states", trying to associate multiple GlobalIds with only tables added a bunch of boilerplate which I didn't think was worth it.

But at the in-memory layer is where we enforce only Tables can have multiple GlobalIds. Instead of sticking a global_id: GlobalId field on CatalogEntry, which might feel natural, I put them on the inner enum CatalogItem variants. If a caller has just a CatalogEntry, they're forced to reason about their item possibly having multiple GlobalIds associated with it. But if they match on the inner CatalogItem they can get the inner object and generally get the single GlobalId, or if it's a table, reason about which GlobalId to use.

Motivation

Implements the approach described in: #30019
Progress towards: https://github.com/MaterializeInc/database-issues/issues/8233

Tips for reviewer

I tried splitting up this PR into multiple smaller ones, but the amount of shimming required to convert back and forth between GlobalId and CatalogItemId made the changes very noisy. As such, I split this single PR into multiple commits where each commit migrates a single logical code path.

Commits that require the most attention are annotated with a ⭐, and tagged by the teams most directly impacted.

⭐ [storage] Migrates code paths for Connections and Secrets to use CatalogItemId. This doesn't require much attention, it's largely just a find and replace since these objects are only ever referenced by the Catalog.
- Something to call out though is this does change Protobuf definitions which AFAIK are not durably persisted anywhere. But someone from Storage should double check me.
⭐ [adapter] This updates our in-memory objects to store a GlobalId per-object, for Tables we store multiple GlobalIds. To review this commit I would start by looking at src/catalog/src/memory/objects.rs to get an understanding of where GlobalIds live and how the individual objects were updated. It also updates our durable Catalog transactions to remove the shims introduced in alter_table: Durable Catalog Migration #30163. What's subtle here is we also change the allocate_user_item_ids(...) API to return both a CatalogItemId and GlobalId with the same inner value. This ensures that that string representation of a CatalogItemId for an object will always be the same as the GlobalId, this allows us to sidestep the issue for now of updating builtin tables to map between CatalogItemId and GlobalId
⭐ [adapter] Migrates the builtin objects codepaths to associate both a CatalogItemId and a GlobalId with a builtin object. Builtin objects are handled separately from normal objects, hence having to migrate the separate code path.
⭐ [adapter] Changes SQL name resolution to use CatalogItemId. It also changes the ResolvedIds newtype from a BTreeSet<GlobalId> to a BTreeMap<CatalogItemId, BTreeSet<GlobalId>>. This way after name resolution we not only know what objects a statement refers to, but also their underlying collections. This also makes the inner field of ResolvedIds private and migrates its callsites.
⭐ [adapter] Adds new APIs to the Catalog to get items by their associated GlobalId, and adds a new trait called CatalogCollectionItem. I would start reviewing this commit by looking at src/sql/src/catalog.rs to get an idea of the API surface.
- The new CatalogCollectionItem is really only necessary to support ALTER TABLE and isn't strictly necessary for this change, but I found that it adds context to the change. The desc(...) method is removed from trait CatalogItem and moved to trait CatalogCollectionItem. You can only get a CatalogCollectionItem (and thus RelationDesc) if you have a GlobalId or a CatalogItemId + Version.
[compute] Updates the trait OptimizerCatalog to allow getting an item by a CatalogItemId, and updates implementations to use new Catalog APIs. This commit is small and only needs to be skimmed.
[adapter] Introduces a new DependencyIds field on some Catalog items. ResolvedIds are derived from name resolution, meanwhile DependencyIds are derived from the HirRelationExpr created after planning.
- This was added because the CatalogEntry::uses method wants to return a list of CatalogItemIds, but the HirRelationExpr only knows about GlobalIds. We could map from GlobalId -> CatalogItemId when calling the method, but this was a decent refactor on it's own, so I opted to cache these dependencies during planning on the few types of objects that need them.
[adapter] Migrates the Coordinator startup/bootstrap codepath to use the new APIs introduced in [5]. No new logic is introduced here, it's largely just a find and replace like entry.id() to mat_view.global_id().
⭐ [adapter] Updates all create plans (i.e. PlanCreate*) to associate a GlobalId with each item.
[adapter] Updates plans for COPY and INSERT to use CatalogItemId instead of GlobalId, since we also want to insert into collections at their latest version. This commit is largely a find and replace and can be skimmed.
[adapter] Updates all drop plans (i.e. PlanDrop*) to use CatalogItemId, this is pretty much a find and replace and can be skimmed.
[adapter] Updates all alter plans (i.e. PlanAlter*) to use CatalogItemId, this is pretty much a find and replace and can be skimmed.
[adapter] Updates all code paths for COMMENT ON and Webhook Sources to use CatalogItemId. This is a fairly small commit and can definitely be skimmed.
⭐ [adapter/compute]Updates all Peek and Subscribe plans to use CatalogItemId. This commit is interesting and could use some extra thought since it's where we map from a catalog item to it's collection, and is one of the primary motivators of this work.
[adapter] Updates all SHOW and INSPECT plans to use CatalogItemId. This commit is quite small and can be skimmed.
[adapter] Updates Catalog Consistency Checks and PlanValidity to use CatalogItemId. This commit is quite small and can be skimmed.
[adapter] Updates EXPLAIN plans to use the new GlobalId-based Catalog APIs. It's pretty straight forward and probably doesn't require too much attention.
[adapter] This commit is a bit of a "grab bag", it updates code paths for auto-routing queries to mz_catalog_server, cluster scheduling, and "caught up" checks to map between CatalogItemId and GlobalId. It could use a bit of extra attention.
⭐ [adapter/persist] Updates ScalarTypes to use CatalogItemId instead of GlobalId.
- IMO this is one of the sketchiest commits of the entire PR because we durably persist ScalarTypes in the Persist schema registry. This change should be okay though since the protobuf type of CatalogItemId is exactly the same as GlobalId (minus the Explain variant) so it is wire compatible.
[adapter] Migrates all builtin table writes (aka the pack_* method) to use CatalogItemId`.
Deletes the GlobalId::to_item_id and CatalogItemId::to_global_id shim methods.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

shepherdlybot · 2024-10-28T15:04:13Z

Mitigations

Completing required mitigations increases Resilience Coverage.

(Required) Code Review 🔍 Detected
(Required) Feature Flag
(Required) Integration Test
(Required) Observability
(Required) QA Review 🔍 Detected
(Required) Run Nightly Tests
Unit Test 🔍 Detected

Risk Summary:

The pull request has a high-risk score of 83, driven by predictors such as the average line count in files and executable lines within files. Historically, PRs with these predictors are 157% more likely to cause a bug compared to the repo baseline. Additionally, 10 files modified in this PR are recent bug hotspots.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File	Percentile
../memory/objects.rs	98
../catalog/builtin_table_updates.rs	92
../catalog/apply.rs	91
../src/pure.rs	98
../statement/ddl.rs	99
../inner/create_continual_task.rs	96
../src/coord.rs	98
../sequencer/inner.rs	96
../catalog/transact.rs	91
../coord/caught_up.rs	94

def- · 2024-10-29T09:19:42Z

Nightly run triggered: https://buildkite.com/materialize/nightly/builds/10190

jkosh44

I'm about 8 commits deep, but thought I'd leave my comments so far.

src/catalog/src/memory/objects.rs

src/adapter/src/catalog/open.rs

src/adapter/src/catalog/consistency.rs

src/catalog/src/memory/objects.rs

src/adapter/src/coord.rs

jkosh44

Still not done, I have 3-4 commits left, but here's another batch of comments.

src/adapter/src/coord.rs

src/adapter/src/catalog/transact.rs

src/adapter/src/coord/sequencer/inner/peek.rs

src/adapter/src/coord/timeline.rs

src/adapter/src/catalog/consistency.rs

jkosh44

Ok, I made it through all the commits.

src/repr/src/scalar.rs

jkosh44 · 2024-10-30T18:59:33Z

src/adapter/src/catalog/builtin_table_updates.rs

Is the term "Global ID" a user facing concept? If so we should consider updating the docs.

I don't believe so! Excluding developer design docs, there isn't any mention of global ids in our documentation

ParkMyCar · 2024-11-01T21:06:05Z

@jkosh44 Responded to most of the comments, planning to take a second pass and push up more changes, will let you know when the PR is ready for a second look!

ParkMyCar · 2024-11-04T19:41:58Z

@jkosh44 this should be ready for a second look, just the two most recent commits

jkosh44

LGTM

* Updates the memory/objects.rs types to fully use CatalogItemId * Update the durable Catalog Transaction type, and its methods to use CatalogItemId * Changes the User and System ID allocators to return both a CatalogItemId and a GlobalId in-memory objects 5

* Update Catalog Transaction APIs to support builtin objects * Update builtin items migration to use CatalogItemId * Update open/startup paths to use CatalogItemId

* Updates name resolution to track a BTreeMap<CatalogItemId, BTreeSet<GlobalId>> * Update callsites to access resolved IDs via methods instead of reaching into the ResolvedIds newtype

* Add a map of GlobalId to CatalogItemId to CatalogState * refactor try_get_item(...) and get_item(...) Catalog APIs to use CatalogItemId * add try_get_item_by_global_id(...) and get_item_by_global_id(...) APIs * add CatalogCollectionItem trait, move desc(...) -> RelationDesc method from CatalogItem to this new trait

* update the APIs on the OptimizerCatalog to support both CatalogItemIds and GlobalIds

* For Views, MaterializedViews, and ContinualTasks track a set of dependencies, which is separate from resolved_ids * Used for the CatalogEntry::uses(...) method

* use the new Catalog APIs and update the Coordintator's startup/bootstrap code path

* Update Create plans for Source, View, MaterializedView, ContinualTask, Sink, Table, and Connection to use CatalogItemId and include a GlobalId

* update plans and sequencing for COPY and INSERT plans

* update plans for Dropping objects * update Catalog Transactions

* update planning and sequencing for Alter* plans * update Catalog transaction handling for Alter* plans

* small changes to support COMMENT ON and Webhook sources

* update planning and sequencing for Peeks and Subscribes * update timestamp and timedomain logic

* update planning and sequencing for EXPLAIN plans

* this is a bit of a grab-bag commit, but is everything to do with cluster routing, scheduling, and caught up detection

* Updates List, Map, and Record variants of ScalarType to reference CatalogItemId

* updates all pack_* methods to use CatalogItemId

* couple of missed changes * updates to error types

* fixup a legit bug when bootstrapping storage collections * refactor some codepaths * fix clippy

* make Catalog timeline APIs use CatalogItemId instead of GlobalId * update comment describing TODO * remove use<'_> until we rebase the PR

* mostly making changes after the removal of the global write lock * small changes here and there

* previously I refactored the timeline context methods in the Catalog to use CatalogItemId, some of them need to use GlobalId because they're called with Transient dataflows, e.g. Peeks

* update builtin migration test that I had previously broken * regen snapshots. purposefully disable the statistics stability test because it generates huge amounts of data * add proptest to make sure GlobalId and CatalogItemId roundtrip

…'t know any better'

`buf` is a bit aggressive in what it considers breaking. Some changes I merged in #30189 were not wire breaking but `buf` thought they were. To prevent needing a force push to main I marked them as temporarily ignored for breaking changes, this PR unignores them. ### Motivation Re-enable protobuf linting on temporarily ignored files. ### Checklist - [x] This PR has adequate test coverage / QA involvement has been duly considered. ([trigger-ci for additional test/nightly runs](https://trigger-ci.dev.materialize.com/)) - [x] This PR has an associated up-to-date [design doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md), is a design doc ([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)), or is sufficiently small to not require a design.  - [x] If this PR evolves [an existing `$T ⇔ Proto$T` mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md) (possibly in a backwards-incompatible way), then it is tagged with a `T-proto` label. - [x] If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label ([example](MaterializeInc/cloud#5021)).  - [x] If this PR includes major [user-facing behavior changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note), I have pinged the relevant PM to schedule a changelog post.

ParkMyCar force-pushed the alter_table2/everything-else branch from af814eb to 64bad48 Compare October 26, 2024 00:38

ParkMyCar marked this pull request as ready for review October 28, 2024 15:02

ParkMyCar requested review from a team and benesch as code owners October 28, 2024 15:02

ParkMyCar requested a review from jkosh44 October 28, 2024 15:02

jkosh44 reviewed Oct 29, 2024

View reviewed changes

jkosh44 reviewed Oct 30, 2024

View reviewed changes

ParkMyCar mentioned this pull request Oct 31, 2024

do not merge #30113

Closed

5 tasks

jkosh44 approved these changes Nov 4, 2024

View reviewed changes

ParkMyCar force-pushed the alter_table2/everything-else branch 2 times, most recently from b9a86a0 to c78ef44 Compare November 11, 2024 20:58

ParkMyCar added 11 commits November 12, 2024 10:09

start, connections and secrets

b95cb6d

builtin object migrations

0fd66e2

* Update Catalog Transaction APIs to support builtin objects * Update builtin items migration to use CatalogItemId * Update open/startup paths to use CatalogItemId

change sql name resolution to use CatalogItemId

32cad3a

* Updates name resolution to track a BTreeMap<CatalogItemId, BTreeSet<GlobalId>> * Update callsites to access resolved IDs via methods instead of reaching into the ResolvedIds newtype

optimizer catalog

8b214f6

* update the APIs on the OptimizerCatalog to support both CatalogItemIds and GlobalIds

track dependent items for uses(...) method

939bd8a

* For Views, MaterializedViews, and ContinualTasks track a set of dependencies, which is separate from resolved_ids * Used for the CatalogEntry::uses(...) method

Coordinator Startup/bootstrap

4c5d55b

* use the new Catalog APIs and update the Coordintator's startup/bootstrap code path

update Create plans

8b958b0

* Update Create plans for Source, View, MaterializedView, ContinualTask, Sink, Table, and Connection to use CatalogItemId and include a GlobalId

copy and insert plans

45dcd6b

* update plans and sequencing for COPY and INSERT plans

drop plans

cc07f5e

* update plans for Dropping objects * update Catalog Transactions

ParkMyCar added 18 commits November 12, 2024 10:12

alter plans

ae3a064

* update planning and sequencing for Alter* plans * update Catalog transaction handling for Alter* plans

comments and webhooks

b421ac6

* small changes to support COMMENT ON and Webhook sources

peek and subscribe plans

b4f536d

* update planning and sequencing for Peeks and Subscribes * update timestamp and timedomain logic

SHOW and INSPECT queries

80c3a88

consistency checks and plan validity

f6eed99

EXPLAIN plans

d658861

* update planning and sequencing for EXPLAIN plans

cluster auto-scheduling and 'caught up' checks

8a306c3

* this is a bit of a grab-bag commit, but is everything to do with cluster routing, scheduling, and caught up detection

Scalar Types

92a1d3e

* Updates List, Map, and Record variants of ScalarType to reference CatalogItemId

builtin table updates

b744818

* updates all pack_* methods to use CatalogItemId

misc

6d83792

* couple of missed changes * updates to error types

remove shim methods

911d7d9

respond to Joe's GitHub feedback

c96770c

* fixup a legit bug when bootstrapping storage collections * refactor some codepaths * fix clippy

responding to GitHub feedback

3f41dda

* make Catalog timeline APIs use CatalogItemId instead of GlobalId * update comment describing TODO * remove use<'_> until we rebase the PR

fixes after rebasing

8fb7090

* mostly making changes after the removal of the global write lock * small changes here and there

refactor timeline context

7df75bb

* previously I refactored the timeline context methods in the Catalog to use CatalogItemId, some of them need to use GlobalId because they're called with Transient dataflows, e.g. Peeks

fix tests

42a4484

* update builtin migration test that I had previously broken * regen snapshots. purposefully disable the statistics stability test because it generates huge amounts of data * add proptest to make sure GlobalId and CatalogItemId roundtrip

fix import conflicts after rebase

15286cd

temporily ignore breaking changes on some proto files since buf doesn…

6458a38

…'t know any better'

ParkMyCar force-pushed the alter_table2/everything-else branch from 1788883 to 6458a38 Compare November 12, 2024 16:49

ParkMyCar enabled auto-merge (squash) November 12, 2024 17:57

ParkMyCar merged commit 235d12a into MaterializeInc:main Nov 12, 2024
148 of 214 checks passed

ParkMyCar mentioned this pull request Nov 12, 2024

proto: unignore breaking changes #30454

Merged

5 tasks

ParkMyCar deleted the alter_table2/everything-else branch November 18, 2024 16:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adapter: Switch from `GlobalId` to `CatalogItemId` #30189

adapter: Switch from `GlobalId` to `CatalogItemId` #30189

ParkMyCar commented Oct 24, 2024 •

edited

Loading

shepherdlybot bot commented Oct 28, 2024 •

edited

Loading

def- commented Oct 29, 2024

jkosh44 left a comment

jkosh44 left a comment

jkosh44 left a comment

jkosh44 Oct 30, 2024

ParkMyCar Nov 4, 2024

ParkMyCar commented Nov 1, 2024

ParkMyCar commented Nov 4, 2024

jkosh44 left a comment

adapter: Switch from GlobalId to CatalogItemId #30189

adapter: Switch from GlobalId to CatalogItemId #30189

Conversation

ParkMyCar commented Oct 24, 2024 • edited Loading

Design

Motivation

Tips for reviewer

Checklist

shepherdlybot bot commented Oct 28, 2024 • edited Loading

Mitigations

def- commented Oct 29, 2024

jkosh44 left a comment

Choose a reason for hiding this comment

jkosh44 left a comment

Choose a reason for hiding this comment

jkosh44 left a comment

Choose a reason for hiding this comment

jkosh44 Oct 30, 2024

Choose a reason for hiding this comment

ParkMyCar Nov 4, 2024

Choose a reason for hiding this comment

ParkMyCar commented Nov 1, 2024

ParkMyCar commented Nov 4, 2024

jkosh44 left a comment

Choose a reason for hiding this comment

adapter: Switch from `GlobalId` to `CatalogItemId` #30189

adapter: Switch from `GlobalId` to `CatalogItemId` #30189

ParkMyCar commented Oct 24, 2024 •

edited

Loading

shepherdlybot bot commented Oct 28, 2024 •

edited

Loading