Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(workspace): gracefully handle failing db connections #194

Merged

Conversation

juleswritescode
Copy link
Collaborator

@juleswritescode juleswritescode commented Feb 3, 2025

With this PR, we should now be able to run basic diagnostics, even if the workspace can't connect to the Db. I tested it locally with pg_cli check test.sql and it seems to work.

The basic gist:

Whenever we called WorkspaceServer::update_settings, we would configure the DatabaseConnection for lazy-loading, but eagerly loaded the SchemaCache.

I thought it was weird that we queried the Db in a update_settings call, so I refactored the SchemaCache to be lazy-loaded as well.

For this, I wrapped the whole thing into a SchemaCacheManager which checks if we've already cached the requested PgPool's schema, returning guarded handles to the underlying SchemaCache.

Additional Things

  • moved SchemaCacheManager, DbConnection, and the async runtime helpers into their own modules
  • DbConnection::get_pool() will now panic if the developer hasn't set the DbConnection options
  • the completions handler will now return an empty array if we can't connect to the db
  • added an optional conn_timeout_secs config option, which is by default set to 10 seconds
  • removed SchemaCache::new(), since that only delegated to its Default impl

} else {
None
}
let pool = connection.get_pool();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of the below is just moved by a tab, because I removed the if let Some(..) = .. closure

@juleswritescode juleswritescode marked this pull request as ready for review February 3, 2025 09:37
@juleswritescode juleswritescode changed the title Fix/db conn timeouts fix(workspace): gracefully handle failing db connections Feb 3, 2025
Copy link
Collaborator

@psteinroe psteinroe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice! Thanks for the cleanup too. Left two comments, let's discuss them in the respective threads 🙏🏼

crates/pg_configuration/src/database.rs Outdated Show resolved Hide resolved
crates/pg_schema_cache/src/schema_cache.rs Outdated Show resolved Hide resolved
@juleswritescode
Copy link
Collaborator Author

juleswritescode commented Feb 8, 2025

@psteinroe

Regarding the "slowness":

I added a --skip-db flag that will make it such we never connect to any db. The DbConnection.get_pool() will simply return None.

Regarding the timeout:

I tried to add a deferred log to the console, something that would print "Your connection takes longer than expected…" if it takes >=2s – but that required wrapping the app.console in a mutex.

So instead, I added a user_hints vector that collects hints to print just after the diagnostics summary.
(We could reuse this e.g. when a user does not properly separate their statement with newlines for the statement splitter.)

I also added an AtomicBool that indicates whether we were able to get a PgPool. If not, we'll print another hint: "Skipped all checks requiring a database connection".

I'm not 100% happy with the result; especially the bool feels inelegant. So hit me with your feedback!

Copy link
Collaborator

@psteinroe psteinroe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the implementation a lot, but the entire user hint story feels like bloat? For now, I think we should either skip this "feedback" entirely or just add it as another "warning" diagnostics? but not sure

crates/pg_cli/src/cli_options.rs Outdated Show resolved Hide resolved
@juleswritescode
Copy link
Collaborator Author

juleswritescode commented Feb 9, 2025

I like the implementation a lot, but the entire user hint story feels like bloat? For now, I think we should either skip this "feedback" entirely or just add it as another "warning" diagnostics? but not sure

agree! I thought about adding it as a warning, but I can also see someone setting their CI to fail on "warn" instead of "error" – and a slow db connection shouldn't be the reason for that.

Let's leave it for now; maybe it's a non-issue and most people simply have working connections.

I do think it would be nice if we could output logs during and not only after the traversal, but let's descope that for now.

@juleswritescode
Copy link
Collaborator Author

juleswritescode commented Feb 9, 2025

@psteinroe Also, you're still on vacation for a day, so don't you dare review before tomorrow

Copy link
Collaborator

@psteinroe psteinroe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I am on German grounds again!)

Lgtm! Just have one comment about the timeout 🙏🏼


/// The connection timeout in seconds.
#[partial(bpaf(long("conn_timeout")))]
pub conn_timeout_secs: Option<u16>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but maybe we can set the default via bpaf?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, we should add it to the comments.

Copy link
Collaborator Author

@juleswritescode juleswritescode Feb 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with using a bpaf fallback, but the partial macro requires that the fallback is wrapped in an Option.

If we want to use display_fallback, we would also need to implement Display for Option<u16>, but we can't because of the Orphan Rule.

A quick fix is to add the fallback to the comment, wdyt?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we can use the debug representation. Works for me, too.

Screenshot 2025-02-09 at 18 27 12

@psteinroe psteinroe merged commit 48b8100 into supabase-community:main Feb 10, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants