-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Queries taking too long #23
Comments
Hi @johnf thanks for reaching out to the team! Could you confirm the following to help me get a sense of what might be going on:
|
I narrowed it down to apple_store because the query I pasted above looks like the problematic one. Right now it's been running for 10 hours. I'm not sure if you have any internal stats on the transformations, but if you do I suspect you'll see the apple ones are running all the time on my account ([email protected]). I'll pause it now so we can confirm 100% |
Hi @johnf I reviewed your run logs and noticed the issue you mentioned. Could you try running this package on its own to see if the problem persists? We've seen similar hangups with Postgres in dbt projects involving multiple packages, often alongside minor updates—like adding a single text field in this case. More puzzling, I've also encountered this issue in another Postgres instance where it occurred in a The issue, however, hasn't appeared consistently enough for us to determine a root cause. I suspect Postgres locks might be a factor, though that’s just a theory for now. Running the |
@fivetran-catfritz I can confirm that if I disable the apple_store the load stays at 10% for hours. |
I've paused all transformations in the dashboard and then commented out all other packages and run them locally. It's currently hanging on
I suspect it will sit there for hours again. FYI, I am happy to jump on a call if someone wants to guide me on running queries manually. This is very reproducible. Also, happy for you to connect to my DB from your end directly. |
With no other queries running, it does finish eventually but takes 80 minutes.
It isn't a super powerful instance, it is an |
Hi @johnf Thank you for running that! 80 mins does seem a bit long for the size. Before we dive deeper, could you check one more thing? If you downgrade to the prior version of dbt_apple_store and run that independently, what is the run time like? I think it would help to have a couple baselines. |
@fivetran-catfritz OK good news, it's super fast on the old version, only 3 seconds.
For reference this is the diff between the queries --- before 2024-08-28 07:54:58.434667925 +1000
+++ after 2024-08-28 07:54:59.798667716 +1000
@@ -23,6 +23,7 @@
),
reporting_grain as (
select distinct
+ source_relation,
date_day,
app_id,
source_type,
@@ -31,6 +32,7 @@
),
joined as (
select
+ reporting_grain.source_relation,
reporting_grain.date_day,
reporting_grain.app_id,
app.app_name,
@@ -54,18 +56,22 @@
from reporting_grain
left join app
on reporting_grain.app_id = app.app_id
+ and reporting_grain.source_relation = app.source_relation
left join app_store_territory
on reporting_grain.date_day = app_store_territory.date_day
+ and reporting_grain.source_relation = app_store_territory.source_relation
and reporting_grain.app_id = app_store_territory.app_id
and reporting_grain.source_type = app_store_territory.source_type
and reporting_grain.territory = app_store_territory.territory
left join downloads_territory
on reporting_grain.date_day = downloads_territory.date_day
+ and reporting_grain.source_relation = downloads_territory.source_relation
and reporting_grain.app_id = downloads_territory.app_id
and reporting_grain.source_type = downloads_territory.source_type
and reporting_grain.territory = downloads_territory.territory
left join usage_territory
on reporting_grain.date_day = usage_territory.date_day
+ and reporting_grain.source_relation = usage_territory.source_relation
and reporting_grain.app_id = usage_territory.app_id
and reporting_grain.source_type = usage_territory.source_type
and reporting_grain.territory = usage_territory.territory I also ran explain on both querys, but t's greek to me 😁 FAST
SLOW
|
@johnf Oh wow ok something is going on there. Thank you for confirming that. The only change from v0.3.0 to v0.4.0 is to allow users to union schemas created by multiple connectors together, which is what the
As for the proper fix, let me talk to my team and get some suggestions on what to explore next. Thanks again for your help with this! |
Hi @johnf The next thing we'd like to do is try to narrow down what aspect of the udpates is causing the hangup. I have created a test branch that removes the - git: https://github.com/fivetran/dbt_apple_store.git
revision: test/remove-source-relation-joins
warn-unpinned: false Let me know if you have any questions! |
Hi, @fivetran-catfritz, the branch works fine. It took about 4 seconds for the bad query |
Very interesting. Thank you @johnf for running the test! It confirms there is definitely something going on with those joins. We'll look into this some more and keep you posted. |
Hey @johnf 👋 We've done some investigating here and believe for Postgres the join conditions which result in joining on empty strings ( We have a potential solution which we've applied in the following package version: packages:
- git: https://github.com/fivetran/dbt_apple_store.git
revision: feature/union-data-performance-enhancement
warn-unpinned: false
If you want, you can install the above package version and run the following command since I only made the relevant changes to the dbt run -s +apple_store__territory_report Let me know if this helps the performance! |
I'm just tested this and I think it has the same problem. This is the query that ran is below. Still running after 10 minutes FYI If it's possible on your end then happy for you to connect straight to the DB to test. query | /* {"app": "dbt", "dbt_version": "1.7.17", "profile_name": "gladly", "target_name": "prod", "node_id": "model.apple_store.apple_store__territory_report"} */+
| +
| +
| +
| +
| create table "dwh_prod"."dbt_apple_store"."apple_store__territory_report__dbt_tmp" +
| +
| +
| as +
| +
| ( +
| with app as ( +
| +
| select * +
| from "dwh_prod"."dbt_apple_store_stg"."stg_apple_store__app" +
| ), +
| +
| app_store_territory as ( +
| +
| select * +
| from "dwh_prod"."dbt_apple_store_stg"."stg_apple_store__app_store_territory" +
| ), +
| +
| country_codes as ( +
| +
| select * +
| from "dwh_prod"."dbt_apple_store_source"."apple_store_country_codes" +
| ), +
| +
| downloads_territory as ( +
| +
| select * +
| from "dwh_prod"."dbt_apple_store_stg"."stg_apple_store__downloads_territory" +
| ), +
| +
| usage_territory as ( +
| +
| select * +
| from "dwh_prod"."dbt_apple_store_stg"."stg_apple_store__usage_territory" +
| ), +
| +
| reporting_grain as ( +
| +
| select distinct +
| source_relation, +
| date_day, +
| app_id, +
| source_type, +
| territory +
| from app_store_territory +
| ), +
| +
| joined as ( +
| +
| select +
| reporting_grain.source_relation, +
| reporting_grain.date_day, +
| reporting_grain.app_id, +
| app.app_name, +
| reporting_grain.source_type, +
| reporting_grain.territory as territory_long, +
| coalesce(official_country_codes.country_code_alpha_2, alternative_country_codes.country_code_alpha_2) as territory_short, +
| coalesce(official_country_codes.region, alternative_country_codes.region) as region, +
| coalesce(official_country_codes.sub_region, alternative_country_codes.sub_region) as sub_region, +
| coalesce(app_store_territory.impressions, 0) as impressions, +
| coalesce(app_store_territory.impressions_unique_device, 0) as impressions_unique_device, +
| coalesce(app_store_territory.page_views, 0) as page_views, +
| coalesce(app_store_territory.page_views_unique_device, 0) as page_views_unique_device, +
| coalesce(downloads_territory.first_time_downloads, 0) as first_time_downloads, +
| coalesce(downloads_territory.redownloads, 0) as redownloads, +
| coalesce(downloads_territory.total_downloads, 0) as total_downloads, +
| coalesce(usage_territory.active_devices, 0) as active_devices, +
| coalesce(usage_territory.active_devices_last_30_days, 0) as active_devices_last_30_days, +
| coalesce(usage_territory.deletions, 0) as deletions, +
| coalesce(usage_territory.installations, 0) as installations, +
| coalesce(usage_territory.sessions, 0) as sessions +
| from reporting_grain +
| left join app +
| on reporting_grain.app_id = app.app_id +
| and reporting_grain.source_relation = app.source_relation +
| left join app_store_territory +
| on reporting_grain.date_day = app_store_territory.date_day +
| and reporting_grain.source_relation = app_store_territory.source_relation +
| and reporting_grain.app_id = app_store_territory.app_id +
| and reporting_grain.source_type = app_store_territory.source_type +
| and reporting_grain.territory = app_store_territory.territory +
| left join downloads_territory +
| on reporting_grain.date_day = downloads_territory.date_day +
| and reporting_grain.source_relation = downloads_territory.source_relation +
| and reporting_grain.app_id = downloads_territory.app_id +
| and reporting_grain.source_type = downloads_territory.source_type +
| and reporting_grain.territory = downloads_territory.territory +
| left join usage_territory +
| on reporting_grain.date_day = usage_territory.date_day +
| and reporting_grain.source_relation = usage_territory.source_relation +
| and reporting_grain.app_id = usage_territory.app_id +
| and reporting_grain.source_type = usage_territory.source_type +
| and reporting_grain.territory = usage_territory.territory +
| left join country_codes as official_country_codes +
| on reporting_grain.territory = official_country_codes.country_name +
| left join country_codes as alternative_country_codes +
| on reporting_grain.territory = alternative_country_codes.alternative_country_name +
| ) +
| +
| select * +
| from joined +
| ); +
|
|
Hey @johnf thanks for sharing and sorry that didn't seem to do the trick. The above was actually my last ditch effort to see if we could avoid the issues we are seeing with constant expressions causing the query planner in Postgres to explode and take significantly longer than necessary. Since the above didn't work, we are going to plan to move forward with the approach where we remove the join condition only if the union schema feature is not being used. As @fivetran-catfritz shared in the branch above, we can confirm removing the join condition when not necessary will address the query performance issue. We are going to move forward with making this update to the Apple Store and Google Play models. What we envision this update to be is something along the lines of the following whenever we join on from reporting_grain
left join app
on reporting_grain.app_id = app.app_id
{{ "and reporting_grain.source_relation = app.source_relation" if var('apple_store_union_schemas',[]) or var('apple_store_union_databases',[]) }} This way we are excluding the join if the defined variables are empty (not using the union data feature). Ideally, we can consolidate this into a single macro that can be referenced in each model so the code is dryer, but this is what we will likely apply to these models and others going forward. I see you originally mentioned you were open to contributing a PR to address this issue. Let me know if that is still of interest to you and I'd be happy to help facilitate the updates. Otherwise, we will add this to our upcoming sprint to address. Thanks! |
@fivetran-joemarkiewicz sounds great. Let me know if you need any more testing |
Hi @johnf, I wanted to post back because we have been able to spend some time thinking through this issue/solution and we don't feel the solution initially proposed (removing the join condition for non union_data environments) is scalable. This feature is applied to a large number of our existing dbt packages, and making this config change to every join is not something which will scale appropriately. We have a few other ideas:
Unfortunately, our test Postgres instance isn't representing a live environment and it's difficult to validate if the changes we are testing will properly address the issue. I recall you were open to sharing your Postgres instance to help with this bug fix. Before then, would you by chance be available for a 30 minute video call to explore this further and allow us to test our assumptions with you? Let me know if you would be interested. Thanks! |
@fivetran-joemarkiewicz Sure happy to help! I just connected with you on LinkedIn so we can organise a time |
Thanks so much for meeting with me yesterday @johnf to troubleshoot this issue! Following our meeting I believe we came to an understanding of where the issue was truly originating. Additionally, I feel I found a viable and appropriate solution. If you would like, you can attempt to run the troublesome model using the below package version. - git: https://github.com/fivetran/dbt_apple_store.git
revision: bugfix/postgres-performance-updates
warn-unpinned: false A short summary of what the issue was and why I believe the above changes will solve the issue:
Ideally, we would just not include the joins at all in your scenario. However, for other customers the I was able to reproduce the error you are seeing and also see this branch resolve the error. On my end I was able to get the territory report to go from never finishing, to completing in just 10 seconds (which seems similar to what you were experiencing in the past). I made similar updates to all the other end models so you should see this issue resolved across this package. Please let me know if this does the trick! If not, we can explore other options to address this performance issue. Thanks again! |
@fivetran-joemarkiewicz Unfortunately this didn't help. But I discovered something interesting. If I run the dbt command it hangs. If I grab the exact same query from --debug and run it manually from psql it finishes in < 30 seconds. Does dbt do anything special when it's talking to postgres? |
I'm sorry to hear @johnf this didn't resolve your issue 😞 Can you confirm that when running the new package version, do you see any new tables materialized in your destination? The new version should have created a these new tables which was intended to help with the query performance. If those tables weren't created then there may be something in the config that needs to be adjusted so they can materialize and you can benefit from the performance improvement. I don't believe dbt does anything on the backend other than what you see in the target/run folder 🤔. I did encounter an issue when testing where when I ran the original broken version right before running the new version, it did slow down my dbt job execution time. I essentially needed to let the cpu usage level out before attempting the newer version, and then it would work without issue each subsequent run on the new version. I'm not sure if this is the same sequence you took, but wanted to share in case it was. |
@johnf just wanted to bump my previous message. |
@fivetran-joemarkiewicz Thanks for the reminder. I just cleared out dbt_packages and retried and it works!! Maybe a caching issue in packages or psql last time?? Thanks for making it happen.
|
That's great news and thank you for confirming! We will be moving forward with this change in the current sprint. |
Hi @johnf thanks again for all your help with this issue. However, last week there was a change to the Apple Store Fivetran connector which is resulting in the majority of the reports this dbt package references are no longer syncing up to date data. The connector now has a new version which you can switch to in order to obtain the same information. However, this new connector is using a different schema from the one this package is built off. Therefore, we will need to modify this package to support the new schema. You can see more details within Issue #28 but you should have also received an email last Friday detailing the changes. The above being said, we will be prioritize updating the dbt package to support the new schema. Fortunately, we have great learnings from this thread which we will apply to the updates in the coming weeks. I'll be sure to keep you in the loop once the updates are applied so you can confirm the same performance gains we achieved in the previously shared branch are retained within the coming update. Thank you and we really appreciate your patience. |
Is there an existing issue for this?
Describe the issue
Around about the 2nd August I upgraded (all packages and dbt) but in particular this package from 0.3.0 to 0.4.0
Since then the load on my small PostgreSQL instance has gone from averaging at 11% to 90%
What I see is the query below running for hours
The only tables in this query with significant rows are the territory tables but they are circa 124000 rows only.
I'm running PostgreSQL in Azure. I'm no DB expert so haven't been really able to dig into this.
Happy to debug and help as needed or be told I'm doing it wrong :)
Relevant error log or model output
Expected behavior
Query completes within a reasonable time
dbt Project configurations
Package versions
What database are you using dbt with?
postgres
dbt Version
1.7.17 (Note I'm also using dbt transformations, same issue in both places)
Additional Context
No response
Are you willing to open a PR to help address this issue?
The text was updated successfully, but these errors were encountered: