Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(postgres-estimated-rows): pg Estimated Rows on Data Warehouse Sync #27634

Merged
merged 19 commits into from
Jan 22, 2025

Conversation

phixMe
Copy link
Contributor

@phixMe phixMe commented Jan 17, 2025

Problem

We did not provide row estimates for Postgres so users aren't really able to calculate cost effectively

image

Changes

  • Adds a call to reference a table level estimates for row counts
  • Returns and renders before syncing sources

Does this work well for both Cloud and self-hosted?

Yes

How did you test this code?

Used a real postgres instance and got some row estimates.

Copy link
Contributor

github-actions bot commented Jan 17, 2025

Size Change: +33 B (0%)

Total Size: 1.16 MB

ℹ️ View Unchanged
Filename Size Change
frontend/dist/toolbar.js 1.16 MB +33 B (0%)

compressed-size-action

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@phixMe
Copy link
Contributor Author

phixMe commented Jan 20, 2025

@Gilbert09 I tried this out on metabase and saw that every 'estimate' returned zero, so I assume that this isn't going to work in our/all environments, I'm going to switch to a more traditional generated count(*) type query... Unfortunately, I cannot join literals and use them with table names in a subquery, so I'll have to generate a second union style query to accomplish this... On the plus side it would no longer be an estimated value...

@phixMe phixMe requested review from Gilbert09 and EDsCODE January 21, 2025 16:42
@phixMe phixMe marked this pull request as ready for review January 21, 2025 16:43
Copy link
Member

@Gilbert09 Gilbert09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions before approving, but this looks good

@@ -20,6 +20,8 @@ export interface LemonTableColumn<T extends Record<string, any>, D extends keyof
/** Tooltip to display on title hover. An info icon ("i" in circle) is shown when a tooltip is available. */
tooltip?: string
key?: string
/** If true, the column is not displayed. Optional, defaults to not disabled. */
is_disabled?: boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe is_visible is better suited here - we use disabled elsewhere for when a UI element is still visible but not interactive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I do want to keep it as a negative for semantic and optional parameterization reasons, so I'll switch to isHidden.

if not tables:
return {}
union = [
f"SELECT '{table[0]}' AS table_name, COUNT(*) AS row_count FROM {schema}.{table[0]}" for table in tables
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to SQL inject here? Can we parameterize this query instead? https://www.psycopg.org/psycopg3/docs/basic/params.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this, the injection would need to be in the table name which I don't think is possible unless there is some crazy escape sequence hack. I'd also like to be more confident, so I'm going to add some substitutions when creating the query expression. I think the substituted approach reads a bit better as well, so no complaints there either.

@@ -309,6 +309,45 @@ def filter_postgres_incremental_fields(columns: list[tuple[str, str]]) -> list[t
return results


def get_postgres_row_count(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all fine to live here right now, but we need to figure out a more longer-term solution for these SQL sources - they're all kinda over the place right now - some future pipeline work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I noticed it's a little chaotic with all of these definitions. Certainly worth consolidating some base interfaces for the corresponding providers.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

Copy link
Member

@Gilbert09 Gilbert09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! :shipit:

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@phixMe phixMe merged commit f5b9e89 into master Jan 22, 2025
99 checks passed
@phixMe phixMe deleted the feat(postgres-estimated-rows) branch January 22, 2025 18:16
fuziontech added a commit that referenced this pull request Jan 22, 2025
* master: (103 commits)
  feat(postgres-estimated-rows): pg Estimated Rows on Data Warehouse Sync (#27634)
  fix: revert darkmode class toggle, updated content on fills (#27783)
  chore: upgrade posthog-js (#27790)
  chore(editor-3001): add back join actions (#27740)
  feat: Add person distinct ID overrides squash job (as dagster job) (#27710)
  fix(created-by-sources): Adding `created_by` to sources (#27751)
  Revert "feat(data-warehouse): V2 pipeline release " (#27791)
  fix: typo for feature flags (#27786)
  fix(defer-unmounting): Defer unmounting of react elements (#27742)
  feat(data-warehouse): V2 pipeline release (#27732)
  fix(data-warehouse): Ensure dates are actual datetime formats (#27777)
  fix: enable hot reload for the products dir (#27746)
  fix: assignee selector when null (#27737)
  chore: clarify rrweb imports (#27776)
  chore(deps): Update posthog-js to 1.207.3 (#27779)
  feat(retention): filters on start/return event (#27770)
  fix(experiments): only show supported math functions (#27589)
  feat(web-analytics): Set unique conversions graph when adding conversions goal (#27774)
  chore: color design system part 1: banner and accents (#27756)
  chore(experiments): Add tests for funnel attribution options (#27752)
  ...
Copy link

sentry-io bot commented Jan 23, 2025

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ InsufficientPrivilege: permission denied for table email_meta_tag /api/projects/{parent_lookup_team_id}/external_... View Issue
  • ‼️ OperationalError: connection to server at "development-databasewriter2462cc03-jonothxrzxcf.czu26m4ieacl.us-west-2.r... /api/projects/{parent_lookup_team_id}/external_... View Issue

Did you find this useful? React with a 👍 or 👎

timgl pushed a commit that referenced this pull request Jan 28, 2025
…nc (#27634)

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants