Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adapter: Switch from GlobalId to CatalogItemId #30189

Merged
merged 29 commits into from
Nov 12, 2024

Conversation

ParkMyCar
Copy link
Member

@ParkMyCar ParkMyCar commented Oct 24, 2024

This PR switches the Adapter/Catalog from referencing items with a GlobalId, to a CatalogItemId.

Internally the compute and storage layers of Materialize will still reference objects by their GlobalId, but conceptually a GlobalId refers to a "collection of data", while a CatalogItemId refers to the database object. This switch allows us to associate multiple GlobalIds (aka "collections of data") with a single Catalog object. The primary motivator for this is supporting live schema migrations, or ALTER TABLE ... ADD COLUMN ....

When adding a new column to a table we'll create a new GlobalId, this ensures that the schema (aka RelationDesc) associated with a GlobalId never changes.

Design

The durable Catalog migration to add CatalogItemId took place in #30163.

What took a bit of thought is where to place the associated GlobalIds for each object, we have the following requirements:

  • Table: multiple GlobalIds
  • Source: single GlobalId (maybe eventually multiple?)
  • Log: single GlobalId
  • View: single GlobalId
  • MaterializedView: single GlobalId
  • ContinualTask: single GlobalId
  • Sink: single GlobalId
  • Index: single GlobalId
  • Type: no GlobalIds
  • Func: no GlobalIds
  • Secret: no GlobalIds
  • Connection: no GlobalIds

Type, Func, Secret, and Connection are never referenced by the compute and storage layer so they don't need a GlobalId. Meanwhile Tables can be evolved so they need to support multiple GlobalIds. What complicates this a bit is how we use the create_sql for an object as our durable record, while we could come up with some syntax like CREATE TABLE [<name> WITH u1 u2 u3] ... as a way to associate multiple GlobalIds with an object, this didn't feel great.

What I landed on was persisting a single GlobalId for all objects, and then including an extra_versions: BTreeMap<Version, GlobalId> at the durable layer. While not all types need a GlobalId, associating a single GlobalId for each object makes things easier to reason about. And while I would like to "design away invalid states", trying to associate multiple GlobalIds with only tables added a bunch of boilerplate which I didn't think was worth it.

But at the in-memory layer is where we enforce only Tables can have multiple GlobalIds. Instead of sticking a global_id: GlobalId field on CatalogEntry, which might feel natural, I put them on the inner enum CatalogItem variants. If a caller has just a CatalogEntry, they're forced to reason about their item possibly having multiple GlobalIds associated with it. But if they match on the inner CatalogItem they can get the inner object and generally get the single GlobalId, or if it's a table, reason about which GlobalId to use.

Motivation

Implements the approach described in: #30019
Progress towards: https://github.com/MaterializeInc/database-issues/issues/8233

Tips for reviewer

I tried splitting up this PR into multiple smaller ones, but the amount of shimming required to convert back and forth between GlobalId and CatalogItemId made the changes very noisy. As such, I split this single PR into multiple commits where each commit migrates a single logical code path.

Commits that require the most attention are annotated with a ⭐, and tagged by the teams most directly impacted.

  1. ⭐ [storage] Migrates code paths for Connections and Secrets to use CatalogItemId. This doesn't require much attention, it's largely just a find and replace since these objects are only ever referenced by the Catalog.
    • Something to call out though is this does change Protobuf definitions which AFAIK are not durably persisted anywhere. But someone from Storage should double check me.
  2. ⭐ [adapter] This updates our in-memory objects to store a GlobalId per-object, for Tables we store multiple GlobalIds. To review this commit I would start by looking at src/catalog/src/memory/objects.rs to get an understanding of where GlobalIds live and how the individual objects were updated. It also updates our durable Catalog transactions to remove the shims introduced in alter_table: Durable Catalog Migration #30163. What's subtle here is we also change the allocate_user_item_ids(...) API to return both a CatalogItemId and GlobalId with the same inner value. This ensures that that string representation of a CatalogItemId for an object will always be the same as the GlobalId, this allows us to sidestep the issue for now of updating builtin tables to map between CatalogItemId and GlobalId
  3. ⭐ [adapter] Migrates the builtin objects codepaths to associate both a CatalogItemId and a GlobalId with a builtin object. Builtin objects are handled separately from normal objects, hence having to migrate the separate code path.
  4. ⭐ [adapter] Changes SQL name resolution to use CatalogItemId. It also changes the ResolvedIds newtype from a BTreeSet<GlobalId> to a BTreeMap<CatalogItemId, BTreeSet<GlobalId>>. This way after name resolution we not only know what objects a statement refers to, but also their underlying collections. This also makes the inner field of ResolvedIds private and migrates its callsites.
  5. ⭐ [adapter] Adds new APIs to the Catalog to get items by their associated GlobalId, and adds a new trait called CatalogCollectionItem. I would start reviewing this commit by looking at src/sql/src/catalog.rs to get an idea of the API surface.
    • The new CatalogCollectionItem is really only necessary to support ALTER TABLE and isn't strictly necessary for this change, but I found that it adds context to the change. The desc(...) method is removed from trait CatalogItem and moved to trait CatalogCollectionItem. You can only get a CatalogCollectionItem (and thus RelationDesc) if you have a GlobalId or a CatalogItemId + Version.
  6. [compute] Updates the trait OptimizerCatalog to allow getting an item by a CatalogItemId, and updates implementations to use new Catalog APIs. This commit is small and only needs to be skimmed.
  7. [adapter] Introduces a new DependencyIds field on some Catalog items. ResolvedIds are derived from name resolution, meanwhile DependencyIds are derived from the HirRelationExpr created after planning.
    • This was added because the CatalogEntry::uses method wants to return a list of CatalogItemIds, but the HirRelationExpr only knows about GlobalIds. We could map from GlobalId -> CatalogItemId when calling the method, but this was a decent refactor on it's own, so I opted to cache these dependencies during planning on the few types of objects that need them.
  8. [adapter] Migrates the Coordinator startup/bootstrap codepath to use the new APIs introduced in [5]. No new logic is introduced here, it's largely just a find and replace like entry.id() to mat_view.global_id().
  9. ⭐ [adapter] Updates all create plans (i.e. PlanCreate*) to associate a GlobalId with each item.
  10. [adapter] Updates plans for COPY and INSERT to use CatalogItemId instead of GlobalId, since we also want to insert into collections at their latest version. This commit is largely a find and replace and can be skimmed.
  11. [adapter] Updates all drop plans (i.e. PlanDrop*) to use CatalogItemId, this is pretty much a find and replace and can be skimmed.
  12. [adapter] Updates all alter plans (i.e. PlanAlter*) to use CatalogItemId, this is pretty much a find and replace and can be skimmed.
  13. [adapter] Updates all code paths for COMMENT ON and Webhook Sources to use CatalogItemId. This is a fairly small commit and can definitely be skimmed.
  14. ⭐ [adapter/compute]Updates all Peek and Subscribe plans to use CatalogItemId. This commit is interesting and could use some extra thought since it's where we map from a catalog item to it's collection, and is one of the primary motivators of this work.
  15. [adapter] Updates all SHOW and INSPECT plans to use CatalogItemId. This commit is quite small and can be skimmed.
  16. [adapter] Updates Catalog Consistency Checks and PlanValidity to use CatalogItemId. This commit is quite small and can be skimmed.
  17. [adapter] Updates EXPLAIN plans to use the new GlobalId-based Catalog APIs. It's pretty straight forward and probably doesn't require too much attention.
  18. [adapter] This commit is a bit of a "grab bag", it updates code paths for auto-routing queries to mz_catalog_server, cluster scheduling, and "caught up" checks to map between CatalogItemId and GlobalId. It could use a bit of extra attention.
  19. ⭐ [adapter/persist] Updates ScalarTypes to use CatalogItemId instead of GlobalId.
    • IMO this is one of the sketchiest commits of the entire PR because we durably persist ScalarTypes in the Persist schema registry. This change should be okay though since the protobuf type of CatalogItemId is exactly the same as GlobalId (minus the Explain variant) so it is wire compatible.
  20. [adapter] Migrates all builtin table writes (aka the pack_* method) to use CatalogItemId`.
  21. Deletes the GlobalId::to_item_id and CatalogItemId::to_global_id shim methods.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@ParkMyCar ParkMyCar force-pushed the alter_table2/everything-else branch from af814eb to 64bad48 Compare October 26, 2024 00:38
@ParkMyCar ParkMyCar marked this pull request as ready for review October 28, 2024 15:02
@ParkMyCar ParkMyCar requested review from a team and benesch as code owners October 28, 2024 15:02
@ParkMyCar ParkMyCar requested a review from jkosh44 October 28, 2024 15:02
Copy link

shepherdlybot bot commented Oct 28, 2024

Risk Score:83 / 100 Bug Hotspots:10 Resilience Coverage:33%

Mitigations

Completing required mitigations increases Resilience Coverage.

  • (Required) Code Review 🔍 Detected
  • (Required) Feature Flag
  • (Required) Integration Test
  • (Required) Observability
  • (Required) QA Review 🔍 Detected
  • (Required) Run Nightly Tests
  • Unit Test 🔍 Detected
Risk Summary:

The pull request has a high-risk score of 83, driven by predictors such as the average line count in files and executable lines within files. Historically, PRs with these predictors are 157% more likely to cause a bug compared to the repo baseline. Additionally, 10 files modified in this PR are recent bug hotspots.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File Percentile
../memory/objects.rs 98
../catalog/builtin_table_updates.rs 92
../catalog/apply.rs 91
../src/pure.rs 98
../statement/ddl.rs 99
../inner/create_continual_task.rs 96
../src/coord.rs 98
../sequencer/inner.rs 96
../catalog/transact.rs 91
../coord/caught_up.rs 94

@def-
Copy link
Contributor

def- commented Oct 29, 2024

Nightly run triggered: https://buildkite.com/materialize/nightly/builds/10190

Copy link
Contributor

@jkosh44 jkosh44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm about 8 commits deep, but thought I'd leave my comments so far.

src/catalog/src/memory/objects.rs Show resolved Hide resolved
src/catalog/src/memory/objects.rs Outdated Show resolved Hide resolved
src/adapter/src/catalog/open.rs Show resolved Hide resolved
src/adapter/src/catalog/consistency.rs Outdated Show resolved Hide resolved
src/catalog/src/memory/objects.rs Show resolved Hide resolved
src/adapter/src/coord.rs Outdated Show resolved Hide resolved
src/adapter/src/coord.rs Show resolved Hide resolved
Copy link
Contributor

@jkosh44 jkosh44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not done, I have 3-4 commits left, but here's another batch of comments.

src/adapter/src/coord.rs Outdated Show resolved Hide resolved
src/adapter/src/coord.rs Show resolved Hide resolved
src/adapter/src/catalog/transact.rs Outdated Show resolved Hide resolved
src/adapter/src/coord/sequencer/inner/peek.rs Outdated Show resolved Hide resolved
src/adapter/src/coord/timeline.rs Outdated Show resolved Hide resolved
src/adapter/src/catalog/consistency.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@jkosh44 jkosh44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I made it through all the commits.

src/repr/src/scalar.rs Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the term "Global ID" a user facing concept? If so we should consider updating the docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe so! Excluding developer design docs, there isn't any mention of global ids in our documentation

@ParkMyCar ParkMyCar mentioned this pull request Oct 31, 2024
5 tasks
@ParkMyCar
Copy link
Member Author

@jkosh44 Responded to most of the comments, planning to take a second pass and push up more changes, will let you know when the PR is ready for a second look!

@ParkMyCar
Copy link
Member Author

@jkosh44 this should be ready for a second look, just the two most recent commits

Copy link
Contributor

@jkosh44 jkosh44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ParkMyCar ParkMyCar force-pushed the alter_table2/everything-else branch 2 times, most recently from b9a86a0 to c78ef44 Compare November 11, 2024 20:58
* Updates the memory/objects.rs types to fully use CatalogItemId
* Update the durable Catalog Transaction type, and its methods to use CatalogItemId
* Changes the User and System ID allocators to return both a CatalogItemId and a GlobalId

in-memory objects 5
* Update Catalog Transaction APIs to support builtin objects
* Update builtin items migration to use CatalogItemId
* Update open/startup paths to use CatalogItemId
* Updates name resolution to track a BTreeMap<CatalogItemId, BTreeSet<GlobalId>>
* Update callsites to access resolved IDs via methods instead of reaching into the ResolvedIds newtype
* Add a map of GlobalId to CatalogItemId to CatalogState
* refactor try_get_item(...) and get_item(...) Catalog APIs to use CatalogItemId
* add try_get_item_by_global_id(...) and get_item_by_global_id(...) APIs
* add CatalogCollectionItem trait, move desc(...) -> RelationDesc method from CatalogItem to this new trait
* update the APIs on the OptimizerCatalog to support both CatalogItemIds and GlobalIds
* For Views, MaterializedViews, and ContinualTasks track a set of dependencies, which is separate from resolved_ids
* Used for the CatalogEntry::uses(...) method
* use the new Catalog APIs and update the Coordintator's startup/bootstrap code path
* Update Create plans for Source, View, MaterializedView, ContinualTask, Sink, Table, and Connection to use CatalogItemId and include a GlobalId
* update plans and sequencing for COPY and INSERT plans
* update plans for Dropping objects
* update Catalog Transactions
* update planning and sequencing for Alter* plans
* update Catalog transaction handling for Alter* plans
* small changes to support COMMENT ON and Webhook sources
* update planning and sequencing for Peeks and Subscribes
* update timestamp and timedomain logic
* update planning and sequencing for EXPLAIN plans
* this is a bit of a grab-bag commit, but is everything to do with cluster routing, scheduling, and caught up detection
* Updates List, Map, and Record variants of ScalarType to reference CatalogItemId
* updates all pack_* methods to use CatalogItemId
* couple of missed changes
* updates to error types
* fixup a legit bug when bootstrapping storage collections
* refactor some codepaths
* fix clippy
* make Catalog timeline APIs use CatalogItemId instead of GlobalId
* update comment describing TODO
* remove use<'_> until we rebase the PR
* mostly making changes after the removal of the global write lock
* small changes here and there
* previously I refactored the timeline context methods in the Catalog to use CatalogItemId, some of them need to use GlobalId
  because they're called with Transient dataflows, e.g. Peeks
* update builtin migration test that I had previously broken
* regen snapshots. purposefully disable the statistics stability test because it generates huge amounts of data
* add proptest to make sure GlobalId and CatalogItemId roundtrip
@ParkMyCar ParkMyCar force-pushed the alter_table2/everything-else branch from 1788883 to 6458a38 Compare November 12, 2024 16:49
@ParkMyCar ParkMyCar enabled auto-merge (squash) November 12, 2024 17:57
@ParkMyCar ParkMyCar merged commit 235d12a into MaterializeInc:main Nov 12, 2024
148 of 214 checks passed
@ParkMyCar ParkMyCar mentioned this pull request Nov 12, 2024
5 tasks
ParkMyCar added a commit that referenced this pull request Nov 12, 2024
`buf` is a bit aggressive in what it considers breaking. Some changes I
merged in #30189 were
not wire breaking but `buf` thought they were. To prevent needing a
force push to main I marked them as temporarily ignored for breaking
changes, this PR unignores them.

### Motivation

Re-enable protobuf linting on temporarily ignored files.

### Checklist

- [x] This PR has adequate test coverage / QA involvement has been duly
considered. ([trigger-ci for additional test/nightly
runs](https://trigger-ci.dev.materialize.com/))
- [x] This PR has an associated up-to-date [design
doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md),
is a design doc
([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)),
or is sufficiently small to not require a design.
  <!-- Reference the design in the description. -->
- [x] If this PR evolves [an existing `$T ⇔ Proto$T`
mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md)
(possibly in a backwards-incompatible way), then it is tagged with a
`T-proto` label.
- [x] If this PR will require changes to cloud orchestration or tests,
there is a companion cloud PR to account for those changes that is
tagged with the release-blocker label
([example](MaterializeInc/cloud#5021)).
<!-- Ask in #team-cloud on Slack if you need help preparing the cloud
PR. -->
- [x] If this PR includes major [user-facing behavior
changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note),
I have pinged the relevant PM to schedule a changelog post.
@ParkMyCar ParkMyCar deleted the alter_table2/everything-else branch November 18, 2024 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants