Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/add transformation runs #141

Merged
merged 63 commits into from
Jan 22, 2025
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
fe37e04
add transformation_runs staging
fivetran-reneeli Dec 17, 2024
740b493
add seed
fivetran-reneeli Dec 17, 2024
1c536fc
fix name to platform and use most current staging template
fivetran-reneeli Dec 17, 2024
741a9bc
correct useage to usage
fivetran-reneeli Dec 17, 2024
ca1a85f
add transformation runs downstream, doc updates
fivetran-reneeli Dec 17, 2024
0132c30
docs
fivetran-reneeli Dec 17, 2024
f179557
changelog and versioning
fivetran-reneeli Dec 17, 2024
77a6dae
Merge branch 'main' into feature/add_transformation_runs
fivetran-reneeli Dec 19, 2024
008abb4
up version to v1.11.0, add docs
fivetran-reneeli Dec 19, 2024
83c76a1
derive measured month in staging and remove from downstream
fivetran-reneeli Dec 19, 2024
09a6ae5
docs
fivetran-reneeli Dec 19, 2024
9066d57
try new schema
fivetran-reneeli Dec 20, 2024
9a64100
add run count test, move the date cast upstream and rm the source rel…
fivetran-reneeli Jan 4, 2025
2487556
Update models/staging/src_fivetran_platform.yml
fivetran-reneeli Jan 4, 2025
13bffa5
Update models/staging/src_fivetran_platform.yml
fivetran-reneeli Jan 4, 2025
1aeec5f
Update models/staging/stg_fivetran_platform__transformation_runs.sql
fivetran-reneeli Jan 4, 2025
1e665d9
Update CHANGELOG.md
fivetran-reneeli Jan 4, 2025
5e1350e
Update models/staging/stg_fivetran_platform.yml
fivetran-reneeli Jan 4, 2025
875679e
Update models/staging/stg_fivetran_platform.yml
fivetran-reneeli Jan 4, 2025
c390c8f
Update models/staging/stg_fivetran_platform__transformation_runs.sql
fivetran-reneeli Jan 4, 2025
4979d60
Update models/staging/stg_fivetran_platform.yml
fivetran-reneeli Jan 4, 2025
1b1aee6
update schema, changelog, and add unique test
fivetran-reneeli Jan 4, 2025
30338cf
rm extra date cast
fivetran-reneeli Jan 4, 2025
1a5ec7b
try new schema for databricks
fivetran-reneeli Jan 4, 2025
b21a322
update schema for databricks sql in run script
fivetran-reneeli Jan 6, 2025
5c82182
add using_transformations variable to models
fivetran-reneeli Jan 8, 2025
7516e1a
add true test to run script for using_transformations
fivetran-reneeli Jan 8, 2025
e3e62ed
add using_transformations to quickstart
fivetran-reneeli Jan 8, 2025
c3be6df
add config to src
fivetran-reneeli Jan 8, 2025
8deed94
add commented out var for using_transformations for docs gen
fivetran-reneeli Jan 8, 2025
3849f8a
schema and update readme
fivetran-reneeli Jan 9, 2025
2c2fa6d
changelog
fivetran-reneeli Jan 9, 2025
5a0fa12
try schema again
fivetran-reneeli Jan 9, 2025
40bd14a
switch to using does table exist for transformation runs
fivetran-reneeli Jan 10, 2025
bc648d8
update docs and configs for the new logic for transformation_runs
fivetran-reneeli Jan 10, 2025
c42d41f
rm transformation_runs var from quickstart file, update run script to…
fivetran-reneeli Jan 10, 2025
1709106
schema
fivetran-reneeli Jan 10, 2025
27931d2
docs
fivetran-reneeli Jan 10, 2025
00587b5
updates to changelog
fivetran-reneeli Jan 10, 2025
10775d9
add freshness null
fivetran-reneeli Jan 14, 2025
5e25ebe
revert join case for usage cte
fivetran-reneeli Jan 14, 2025
f16db58
update some documentation
fivetran-reneeli Jan 14, 2025
b495f78
docs
fivetran-reneeli Jan 14, 2025
adc741c
Update DECISIONLOG.md
fivetran-reneeli Jan 16, 2025
315ac86
doc updates
fivetran-reneeli Jan 16, 2025
c148487
add false config for transformation runs
fivetran-reneeli Jan 16, 2025
ca007f3
schema
fivetran-reneeli Jan 16, 2025
b352519
databricks
fivetran-reneeli Jan 16, 2025
4827893
Merge branch 'main' into feature/add_transformation_runs
fivetran-reneeli Jan 21, 2025
a052a13
changelog
fivetran-reneeli Jan 22, 2025
0b2d46a
Update models/fivetran_platform__usage_mar_destination_history.sql
fivetran-reneeli Jan 22, 2025
b896ad0
Update models/staging/src_fivetran_platform.yml
fivetran-reneeli Jan 22, 2025
e6bfd21
add redshift limit 1
fivetran-reneeli Jan 22, 2025
6936d29
add documentation
fivetran-reneeli Jan 22, 2025
6aa7939
Merge branch 'feature/add_transformation_runs' of https://github.com/…
fivetran-reneeli Jan 22, 2025
15a55f3
merge
fivetran-reneeli Jan 22, 2025
2589cbe
changelog updates
fivetran-reneeli Jan 22, 2025
0cc1669
schema
fivetran-reneeli Jan 22, 2025
402758c
databricks schema change
fivetran-reneeli Jan 22, 2025
6334ff5
Update CHANGELOG.md
fivetran-reneeli Jan 22, 2025
9f095e8
Update models/staging/src_fivetran_platform.yml
fivetran-reneeli Jan 22, 2025
1865693
schema
fivetran-reneeli Jan 22, 2025
6471698
Merge branch 'feature/add_transformation_runs' of https://github.com/…
fivetran-reneeli Jan 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
# dbt_fivetran_log v1.11.0
[PR #141](https://github.com/fivetran/dbt_fivetran_log/pull/141) includes the following updates:

## Schema Changes: Adding the transformation_runs table
- We have added the `transformation_runs` source table. This includes the following updates:
- Added a new staging `stg_fivetran_platform__transformation_runs` model. Additionally, a new tmp model `stg_fivetran_platform__transformation_runs_tmp` and `get_transformation_runs_columns()` macro to ensure all required columns are present in the new staging model.
fivetran-reneeli marked this conversation as resolved.
Show resolved Hide resolved
- Added the following fields to the `fivetran_platform__usage_mar_destination_history` end model for each destination and month:
fivetran-jamie marked this conversation as resolved.
Show resolved Hide resolved
- `paid_model_runs`
- `free_model_runs`
- `total_model_runs`
- Included documentation about the respective fields in the `transformation_runs` source table and the aggregated `*_model_run` fields.

## Under the Hood
- Added `transformation_runs` seed data in `integration_tests/seeds/`.

# dbt_fivetran_log v1.10.0
[PR #140](https://github.com/fivetran/dbt_fivetran_log/pull/140) includes the following updates:

Expand All @@ -10,6 +25,7 @@
## Under the Hood (Maintainers Only)
- Enhanced seed data for integration testing to include the different spellings and ensure compatibility with Redshift.


# dbt_fivetran_log v1.9.1
[PR #138](https://github.com/fivetran/dbt_fivetran_log/pull/138) includes the following updates:

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ Include the following Fivetran Platform package version range in your `packages.
```yaml
packages:
- package: fivetran/fivetran_log
version: [">=1.10.0", "<1.11.0"]
version: [">=1.11.0", "<1.12.0"]
```

> Note that although the source connector is now "Fivetran Platform", the package retains the old name of "fivetran_log".
Expand Down
5 changes: 3 additions & 2 deletions dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
config-version: 2
name: 'fivetran_log'
version: '1.10.0'
version: '1.11.0'
require-dbt-version: [">=1.3.0", "<2.0.0"]

models:
Expand All @@ -22,4 +22,5 @@ vars:
destination_membership: "{{ source('fivetran_platform', 'destination_membership') }}"
log: "{{ source('fivetran_platform', 'log') }}"
user: "{{ source('fivetran_platform', 'user') }}"
usage_cost: "{{ source('fivetran_platform', 'usage_cost') }}"
usage_cost: "{{ source('fivetran_platform', 'usage_cost') }}"
transformation_runs: "{{ source('fivetran_platform','transformation_runs') }}"
2 changes: 1 addition & 1 deletion docs/catalog.json

Large diffs are not rendered by default.

47 changes: 37 additions & 10 deletions docs/index.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/manifest.json

Large diffs are not rendered by default.

12 changes: 6 additions & 6 deletions integration_tests/ci/sample.profiles.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@ integration_tests:
pass: "{{ env_var('CI_REDSHIFT_DBT_PASS') }}"
dbname: "{{ env_var('CI_REDSHIFT_DBT_DBNAME') }}"
port: 5439
schema: fivetran_platform_integration_tests_5
schema: fivetran_platform_integration_tests_6
threads: 8
bigquery:
type: bigquery
method: service-account-json
project: 'dbt-package-testing'
schema: fivetran_platform_integration_tests_5
schema: fivetran_platform_integration_tests_6
threads: 8
keyfile_json: "{{ env_var('GCLOUD_SERVICE_KEY') | as_native }}"
snowflake:
Expand All @@ -33,7 +33,7 @@ integration_tests:
role: "{{ env_var('CI_SNOWFLAKE_DBT_ROLE') }}"
database: "{{ env_var('CI_SNOWFLAKE_DBT_DATABASE') }}"
warehouse: "{{ env_var('CI_SNOWFLAKE_DBT_WAREHOUSE') }}"
schema: fivetran_platform_integration_tests_5
schema: fivetran_platform_integration_tests_6
threads: 8
postgres:
type: postgres
Expand All @@ -42,13 +42,13 @@ integration_tests:
pass: "{{ env_var('CI_POSTGRES_DBT_PASS') }}"
dbname: "{{ env_var('CI_POSTGRES_DBT_DBNAME') }}"
port: 5432
schema: fivetran_platform_integration_tests_5
schema: fivetran_platform_integration_tests_6
threads: 8
databricks:
catalog: "{{ env_var('CI_DATABRICKS_DBT_CATALOG') }}"
host: "{{ env_var('CI_DATABRICKS_DBT_HOST') }}"
http_path: "{{ env_var('CI_DATABRICKS_DBT_HTTP_PATH') }}"
schema: fivetran_platform_integration_tests_5
schema: fivetran_platform_integration_tests_6
threads: 8
token: "{{ env_var('CI_DATABRICKS_DBT_TOKEN') }}"
type: databricks
Expand All @@ -66,7 +66,7 @@ integration_tests:
server: "{{ env_var('CI_SQLSERVER_DBT_SERVER') }}"
port: 1433
database: "{{ env_var('CI_SQLSERVER_DBT_DATABASE') }}"
schema: fivetran_platform_integration_tests_5
schema: fivetran_platform_integration_tests_6
user: "{{ env_var('CI_SQLSERVER_DBT_USER') }}"
password: "{{ env_var('CI_SQLSERVER_DBT_PASS') }}"
threads: 8
6 changes: 4 additions & 2 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 'fivetran_log_integration_tests'
version: '1.10.0'
version: '1.11.0'

config-version: 2
profile: 'integration_tests'
Expand All @@ -10,7 +10,7 @@ dispatch:

vars:
fivetran_log:
fivetran_platform_schema: "fivetran_platform_integration_tests_5"
fivetran_platform_schema: "fivetran_platform_integration_tests_6"
fivetran_platform_account_identifier: "account"
fivetran_platform_incremental_mar_identifier: "incremental_mar"
fivetran_platform_connector_identifier: "connector"
Expand All @@ -20,6 +20,8 @@ vars:
fivetran_platform_destination_membership_identifier: "destination_membership"
fivetran_platform_log_identifier: "log"
fivetran_platform_user_identifier: "user"
fivetran_platform_transformation_runs_identifier: "transformation_runs"


models:
fivetran_log:
Expand Down
3 changes: 3 additions & 0 deletions integration_tests/seeds/transformation_runs.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
destination_id,job_id,measured_date,project_type,free_type,job_name,updated_at,model_runs,_fivetran_synced
egg_shell,formosa,2024-12-15 00:00:00.000000,QUICKSTART,PAID,Github/staging,2024-12-15 18:13:59.840000,6,2024-12-16 00:11:52.792000
egg_shell,macau,2024-12-08 00:00:00.000000,QUICKSTART,PROMOTION,Github/staging,2024-12-09 15:21:02.278000,6,2024-12-09 18:11:46.905000
17 changes: 17 additions & 0 deletions macros/get_transformation_runs_columns.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{% macro get_transformation_runs_columns() %}

{% set columns = [
{"name": "_fivetran_synced", "datatype": dbt.type_timestamp()},
{"name": "destination_id", "datatype": dbt.type_string()},
{"name": "free_type", "datatype": dbt.type_string()},
{"name": "job_id", "datatype": dbt.type_string()},
{"name": "job_name", "datatype": dbt.type_string()},
{"name": "measured_date", "datatype": dbt.type_timestamp()},
{"name": "model_runs", "datatype": dbt.type_int()},
{"name": "project_type", "datatype": dbt.type_string()},
{"name": "updated_at", "datatype": dbt.type_timestamp()}
] %}

{{ return(columns) }}

{% endmacro %}
9 changes: 9 additions & 0 deletions models/fivetran_platform.yml
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,15 @@ models:
- name: dollars_spent
description: The dollar amount used by the destination for the given month.

- name: paid_model_runs
description: The number of paid model runs for the destination for the given month.

- name: free_model_runs
description: The number of free model runs for the destination for the given month.

- name: total_model_runs
description: The total number of model runs for the destination for the given month.

- name: fivetran_platform__connector_daily_events
description: >
Table of each connector's daily history of the quantity of api calls, schema changes, and records modified,
Expand Down
32 changes: 25 additions & 7 deletions models/fivetran_platform__usage_mar_destination_history.sql
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,24 @@ credits_used as (
from {{ ref('stg_fivetran_platform__credits_used') }}
),

useage_cost as (
usage_cost as (
fivetran-jamie marked this conversation as resolved.
Show resolved Hide resolved

select *
from {{ ref('stg_fivetran_platform__usage_cost') }}
),

transformation_runs as (

select
destination_id,
cast(measured_month as date) as measured_month,
fivetran-jamie marked this conversation as resolved.
Show resolved Hide resolved
sum(case when free_type = 'PAID' then model_runs else 0 end) as paid_model_runs,
sum(case when free_type != 'PAID' then model_runs else 0 end) as free_model_runs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've seen cases in the past where the casing of strings in the Fivetran Platform data model could possibly change in the future. Can we make this more future proof and simple to a lower casing search?

Suggested change
sum(case when free_type = 'PAID' then model_runs else 0 end) as paid_model_runs,
sum(case when free_type != 'PAID' then model_runs else 0 end) as free_model_runs,
sum(case when lower(free_type) = 'paid' then model_runs else 0 end) as paid_model_runs,
sum(case when lower(free_type) != 'paid' then model_runs else 0 end) as free_model_runs,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey I saw this was added but don't we do upper(free_type) as free_type in the staging model?

Copy link
Contributor Author

@fivetran-reneeli fivetran-reneeli Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you are 100% correct thanks for noticing that. I think it makes more sense to keep casing in staging, but I just realized everywhere else we do a lower albeit in the transforms. I'll do a lower in the staging and remove it from the transforms.

cc @fivetran-joemarkiewicz

sum(model_runs) as total_model_runs
fivetran-jamie marked this conversation as resolved.
Show resolved Hide resolved
from {{ ref('stg_fivetran_platform__transformation_runs') }}
group by destination_id, measured_month
),

destination_mar as (

select
Expand All @@ -32,14 +44,14 @@ destination_mar as (
usage as (

select
coalesce(credits_used.destination_id, useage_cost.destination_id) as destination_id,
coalesce(credits_used.destination_id, usage_cost.destination_id) as destination_id,
credits_used.credits_spent,
useage_cost.dollars_spent,
cast(concat(coalesce(credits_used.measured_month,useage_cost.measured_month), '-01') as date) as measured_month -- match date format to join with MAR table
usage_cost.dollars_spent,
cast(concat(coalesce(credits_used.measured_month,usage_cost.measured_month), '-01') as date) as measured_month -- match date format to join with MAR table
from credits_used
full outer join useage_cost
on useage_cost.measured_month = credits_used.measured_month
and useage_cost.destination_id = credits_used.destination_id
full outer join usage_cost
on usage_cost.measured_month = credits_used.measured_month
and usage_cost.destination_id = credits_used.destination_id
),

join_usage_mar as (
Expand All @@ -53,6 +65,9 @@ join_usage_mar as (
destination_mar.free_monthly_active_rows,
destination_mar.paid_monthly_active_rows,
destination_mar.total_monthly_active_rows,
transformation_runs.paid_model_runs,
transformation_runs.free_model_runs,
transformation_runs.total_model_runs,

-- credit and usage mar calculations
round( cast(nullif(usage.credits_spent,0) * 1000000.0 as {{ dbt.type_numeric() }}) / cast(nullif(destination_mar.total_monthly_active_rows,0) as {{ dbt.type_numeric() }}), 2) as credits_spent_per_million_mar,
Expand All @@ -63,6 +78,9 @@ join_usage_mar as (
left join usage
on destination_mar.measured_month = cast(usage.measured_month as date)
and destination_mar.destination_id = usage.destination_id
left join transformation_runs
on destination_mar.measured_month = cast(transformation_runs.measured_month as date)
fivetran-jamie marked this conversation as resolved.
Show resolved Hide resolved
and destination_mar.destination_id = transformation_runs.destination_id
)

select *
Expand Down
23 changes: 23 additions & 0 deletions models/staging/src_fivetran_platform.yml
Original file line number Diff line number Diff line change
Expand Up @@ -200,5 +200,28 @@ sources:
description: Phone number associated with user.
- name: verified
description: Boolean that indicates whether the user has verified their email address in the account creation process.
- name: _fivetran_synced
description: Timestamp of when the record was last synced.

- name: transformation_runs
identifier: '{{ var("fivetran_log_transformation_runs_identifier", "transformation_runs") }}'
fivetran-reneeli marked this conversation as resolved.
Show resolved Hide resolved
description: Table of all transformation runs that have been executed by Fivetran.
fivetran-jamie marked this conversation as resolved.
Show resolved Hide resolved
columns:
- name: destination_id
description: Foreign key referencing the destination for which model runs is calculated.
- name: job_id
description: Foreign key referencing the job affiliated with the transformation run.
fivetran-reneeli marked this conversation as resolved.
Show resolved Hide resolved
- name: measured_date
description: The date in UTC format of when models were run.
- name: project_type
description: The type of transformation project that was run (ie DBT_CORE, QUICKSTART).
- name: free_type
description: If it is free MAR, the value indicates the type of free MAR. For paid MAR, the value is `PAID`.
fivetran-reneeli marked this conversation as resolved.
Show resolved Hide resolved
- name: job_name
description: The name of the transformation job that was run.
- name: updated_at
description: Timestamp of when the record was last updated.
- name: model_runs
description: The number of models that were run in the transformation.
- name: _fivetran_synced
description: Timestamp of when the record was last synced.
27 changes: 26 additions & 1 deletion models/staging/stg_fivetran_platform.yml
Original file line number Diff line number Diff line change
Expand Up @@ -187,4 +187,29 @@ models:
- name: phone
description: Phone number associated with user.
- name: is_verified
description: Boolean that indicates whether the user has verified their email address in the account creation process.
description: Boolean that indicates whether the user has verified their email address in the account creation process.

- name: stg_fivetran_platform__transformation_runs
description: Table of all transformation runs that have been executed by Fivetran.
fivetran-jamie marked this conversation as resolved.
Show resolved Hide resolved
columns:
- name: destination_id
description: Foreign key referencing the destination for which model runs is calculated.
- name: job_id
description: Foreign key referencing the job affiliated with the transformation run.
fivetran-reneeli marked this conversation as resolved.
Show resolved Hide resolved
- name: measured_date
description: The date in UTC format of when models were run.
- name: measured_month
description: The month (yyyy-mm) of the usage activity.
fivetran-reneeli marked this conversation as resolved.
Show resolved Hide resolved
- name: project_type
description: The type of transformation project that was run (ie DBT_CORE, QUICKSTART).
- name: free_type
description: If it is free MAR, the value indicates the type of free MAR. For paid MAR, the value is `PAID`.
fivetran-reneeli marked this conversation as resolved.
Show resolved Hide resolved
- name: job_name
description: The name of the transformation job that was run.
- name: updated_at
description: Timestamp of when the record was last updated.
- name: model_runs
description: The number of models that were run in the transformation.
- name: _fivetran_synced
description: Timestamp of when the record was last synced.

42 changes: 42 additions & 0 deletions models/staging/stg_fivetran_platform__transformation_runs.sql
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we're using the does_table_exist macro and have to create a conditional block to pass all null fields for when table does not exist, I believe it doesn't work with the tmp model so I removed that, and instead wrapped up everything in the staging model as shown. This follows what's done in the stg_fivetran_platform__usage_cost model

Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@

with base as (
fivetran-reneeli marked this conversation as resolved.
Show resolved Hide resolved

select *
from {{ ref('stg_fivetran_platform__transformation_runs_tmp') }}
),

fields as (

select
{{
fivetran_utils.fill_staging_columns(
source_columns=adapter.get_columns_in_relation(ref('stg_fivetran_platform__transformation_runs_tmp')),
staging_columns=get_transformation_runs_columns()
)
}}
{{ fivetran_utils.source_relation(
union_schema_variable='fivetran_platform_union_schemas',
union_database_variable='fivetran_platform_union_databases')
}}
fivetran-jamie marked this conversation as resolved.
Show resolved Hide resolved
from base
),

final as (

select
_fivetran_synced,
destination_id,
free_type,
fivetran-reneeli marked this conversation as resolved.
Show resolved Hide resolved
job_id,
job_name,
cast(measured_date as {{ dbt.type_timestamp() }}) as measured_date,
model_runs,
project_type,
updated_at
from fields
)

select
*,
{{ dbt.date_trunc('month', 'measured_date') }} as measured_month
fivetran-jamie marked this conversation as resolved.
Show resolved Hide resolved
from final
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
select *
from {{ var('transformation_runs') }}