-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Athena support POC #1
base: master
Are you sure you want to change the base?
Changes from all commits
b18f50b
112a31b
959d3f9
995a80d
882a404
0ba5813
2f77726
fe7c02f
8ac690f
9d71a72
b75c383
a0be46a
c519d4f
8ca54bf
b3966c5
150abb1
14a0bc1
bf4fc24
8851dd3
4ba7f2d
1abc47b
53ba6fe
37ea579
8cbdccf
f90633f
9987d93
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -43,4 +43,17 @@ | |
from information_schema.columns | ||
where upper(table_schema) = upper('{{ schema_name }}') | ||
|
||
{% endmacro %} | ||
{% endmacro %} | ||
|
||
{% macro athena__get_columns_from_information_schema(database_name, schema_name) %} | ||
select | ||
upper(table_catalog || '.' || table_schema || '.' || table_name) as full_table_name, | ||
upper(table_catalog) as database_name, | ||
upper(table_schema) as schema_name, | ||
upper(table_name) as table_name, | ||
upper(column_name) as column_name, | ||
data_type | ||
from information_schema.columns | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI: this query is going to be super slow - did you think about using a dbt-athena method to make that working? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for pointing this out. I noticed dbt-athena switched to glue apis some time ago. Definitely it should be checked, once the port is working properly overall. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the issue is that in order to use glue api you have to do an adapter.dispatch call from your macro, as it's not possible to make python invocations that are outside the scope of the adapter. |
||
where upper(table_schema) = upper('{{ schema_name }}') | ||
|
||
{% endmacro %} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,19 @@ | ||
{%- macro edr_cast_as_timestamp(timestamp_field) -%} | ||
{{ return(adapter.dispatch('edr_cast_as_timestamp', 'elementary')(timestamp_field)) }} | ||
{%- endmacro -%} | ||
|
||
{%- macro default__edr_cast_as_timestamp(timestamp_field) -%} | ||
cast({{ timestamp_field }} as {{ elementary.edr_type_timestamp() }}) | ||
{%- endmacro -%} | ||
|
||
{# Athena needs explicit conversion for ISO8601 timestamps used in buckets_cte #} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
{%- macro athena__edr_cast_as_timestamp(timestamp_field) -%} | ||
coalesce( | ||
try_cast({{ timestamp_field }} as {{ elementary.edr_type_timestamp() }}), | ||
cast(from_iso8601_timestamp(cast({{ timestamp_field }} AS {{ elementary.edr_type_string() }})) AS {{ elementary.edr_type_timestamp() }}) | ||
) | ||
{%- endmacro -%} | ||
|
||
{%- macro edr_cast_as_float(column) -%} | ||
cast({{ column }} as {{ elementary.edr_type_float() }}) | ||
{%- endmacro -%} | ||
|
@@ -43,6 +55,15 @@ | |
cast({{ elementary.edr_cast_as_timestamp(timestamp_field) }} as {{ elementary.edr_type_date() }}) | ||
{%- endmacro -%} | ||
|
||
{# Athena needs explicit conversion for ISO8601 timestamps used in buckets_cte #} | ||
{%- macro athena__edr_cast_as_date(timestamp_field) -%} | ||
coalesce( | ||
try_cast({{ timestamp_field }} as {{ elementary.edr_type_date() }}), | ||
cast(from_iso8601_timestamp(cast({{ timestamp_field }} AS {{ elementary.edr_type_string() }})) AS {{ elementary.edr_type_date() }}) | ||
) | ||
{%- endmacro -%} | ||
|
||
|
||
{%- macro const_as_text(string) -%} | ||
{{ return(adapter.dispatch('const_as_text', 'elementary')(string)) }} | ||
{%- endmacro -%} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -43,6 +43,11 @@ | |
{% do return("string") %} | ||
{% endmacro %} | ||
|
||
{% macro athena__edr_type_string() %} | ||
{% do return("varchar") %} | ||
{% endmacro %} | ||
|
||
|
||
|
||
|
||
{%- macro edr_type_long_string() -%} | ||
|
@@ -93,6 +98,10 @@ | |
|
||
|
||
{% macro edr_type_timestamp() %} | ||
{{ return(adapter.dispatch('edr_type_timestamp', 'elementary')()) }} | ||
{% endmacro %} | ||
|
||
{% macro default__edr_type_timestamp() %} | ||
{% set macro = dbt.type_timestamp or dbt_utils.type_timestamp %} | ||
{% if not macro %} | ||
{{ exceptions.raise_compiler_error("Did not find a `type_timestamp` macro.") }} | ||
|
@@ -121,3 +130,13 @@ | |
{% macro bigquery__edr_type_date() %} | ||
date | ||
{% endmacro %} | ||
|
||
{% macro athena__edr_type_timestamp() %} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. based on |
||
{%- set config = model.get('config', {}) -%} | ||
{%- set table_type = config.get('table_type', 'glue') -%} | ||
{%- if table_type == 'iceberg' -%} | ||
timestamp(6) | ||
{%- else -%} | ||
timestamp | ||
{%- endif -%} | ||
{% endmacro %} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,3 +25,12 @@ | |
{% do run_query(dbt.create_table_as(temporary, relation, sql_query)) %} | ||
{% do adapter.commit() %} | ||
{% endmacro %} | ||
|
||
{% macro athena__create_or_replace(temporary, relation, sql_query) %} | ||
{% set drop_query %} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TODO: |
||
drop table if exists {{ relation.schema }}.{{ relation.identifier }} | ||
{% endset %} | ||
{% do elementary.run_query(drop_query) %} | ||
{% do run_query(dbt.create_table_as(temporary, relation, sql_query)) %} | ||
{% do adapter.commit() %} | ||
{% endmacro %} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -111,12 +111,22 @@ | |
{{- return(string_value | replace("'", "''")) -}} | ||
{%- endmacro -%} | ||
|
||
{%- macro athena__escape_special_chars(string_value) -%} | ||
{{- return(string_value | replace("'", "''")) -}} | ||
{%- endmacro -%} | ||
|
||
{%- macro render_value(value) -%} | ||
{%- if value is defined and value is not none -%} | ||
{%- if value is number -%} | ||
{{- value -}} | ||
{%- elif value is string -%} | ||
'{{- elementary.escape_special_chars(value) -}}' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. inserts with timestamp strings should be casted explicitly in Athena. How to handle it properly? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that the clean thing would be to pass to this function the actual type of the column we are rendering to from
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks @haritamar , looks really straightforward. Do you see any problem with using |
||
{%- if value.startswith('cast(') -%} | ||
{{- value -}} | ||
{%- elif value.startswith('coalesce(') -%} | ||
{{- value -}} | ||
{%- else -%} | ||
'{{- elementary.escape_special_chars(value) -}}' | ||
{%- endif -%} | ||
{%- elif value is mapping or value is sequence -%} | ||
'{{- elementary.escape_special_chars(tojson(value)) -}}' | ||
{%- else -%} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,3 +19,29 @@ | |
{% do return(macro(target_relation, tmp_relation, unique_key, dest_columns)) %} | ||
{{ return(merge_sql) }} | ||
{% endmacro %} | ||
|
||
{% macro athena__merge_sql(target_relation, tmp_relation, unique_key, dest_columns, incremental_predicates) %} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ideally, this should be done in dbt-athena There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @artem-garmash currently we have https://github.com/dbt-athena/dbt-athena/blob/main/dbt/include/athena/macros/materializations/models/incremental/merge.sql#L50 - did you have a look? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sure, I was using it for reference. But that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It doesn't seems that we overwrite |
||
|
||
{% set query %} | ||
merge into {{ target_relation }} as target using {{ tmp_relation }} as src | ||
ON (target.{{unique_key}} = src.{{ unique_key}}) | ||
when matched | ||
then update set | ||
{%- for col in dest_columns %} | ||
{{ col.column }} = src.{{ col.column }} {{ ", " if not loop.last }} | ||
{%- endfor %} | ||
when not matched | ||
then insert ( | ||
{%- for col in dest_columns %} | ||
{{ col.column }} {{ ", " if not loop.last }} | ||
{%- endfor %} | ||
|
||
) | ||
values ( | ||
{%- for col in dest_columns %} | ||
src.{{ col.column }} {{ ", " if not loop.last }} | ||
{%- endfor %} | ||
) | ||
{% endset %} | ||
{% do return(query) %} | ||
{% endmacro %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if I run the CLI from a setup that require the usage of an AWS session token? e.g. I'm running the cli from a setup where I use AWS SSO - that might be revisited to support this cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean? It's just a copy of dbt-athena config from https://github.com/dbt-athena/dbt-athena#configuring-your-profile and dbt is used by the cli to query elementary tables. And the CLI is used same way as dbt with dbt-athena. E.g. if I'm using AWS SSO,
aws_profile_name
is used and the login is outside of the scope of dbt/elementary.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when working with AWS SSO setups sometimes an extra variable need to be passed if we get the keys from something like
aws-vault
the key is aws_session_token, and I was wondering if we need to include that.But you are right in most of the case if the user uses
aws sso login
, using the aws profile is enough.