Athena support POC #1

artem-garmash · 2023-06-13T06:47:04Z

POC athena support for elementary. Limited to Athena 3 and iceberg tables for incremental elementary tables. Testing with dbt-core 1.5.0, dbt-athena-community 1.5.0 and lalalilo/athena_utils 0.3.0.
Working features based on ad-hoc tests:

elementary dbt artifcat tables (dbt_*, elementary_test_results)
dbt and elementary tests (volume_anomalies, column_anomalies, dimension_anomalies)
one page report (edr report)
alert tables and sending slack alerts (edr monitor)

Views don't work against information_schema, maybe change them to tables

TODO: handle properly to allow casts for insert expressions without quoting all strings

using drop_relation() doesn't work in this context as type is null use explicit drop table for now

it should be done in athena-connector properly

artem-garmash · 2023-06-13T07:34:28Z

macros/utils/data_types/cast_column.sql

    cast({{ timestamp_field }} as {{ elementary.edr_type_timestamp() }})
 {%- endmacro -%}

+{# Athena needs explicit conversion for ISO8601 timestamps used in buckets_cte #}


agate_to_dicts converts timestamps to ISO8601 format and Athena needs explicit handling

artem-garmash · 2023-06-13T07:36:59Z

macros/utils/data_types/data_type.sql

@@ -121,3 +130,13 @@
 {% macro bigquery__edr_type_date() %}
    date
 {% endmacro %}
+
+{% macro athena__edr_type_timestamp() %}


based on cast_timestamp from dbt-athena

artem-garmash · 2023-06-13T07:41:01Z

macros/utils/table_operations/create_or_replace.sql

@@ -25,3 +25,12 @@
    {% do run_query(dbt.create_table_as(temporary, relation, sql_query)) %}
    {% do adapter.commit() %}
 {% endmacro %}
+
+{% macro athena__create_or_replace(temporary, relation, sql_query) %}
+    {% set drop_query %}


TODO: relation doesn't have type for some reason when this is called during the initial run to build elementary tables. drop_relation fails as it checks for type.

artem-garmash · 2023-06-13T07:43:23Z

macros/utils/table_operations/insert_rows.sql

 {%- macro render_value(value) -%}
    {%- if value is defined and value is not none -%}
        {%- if value is number -%}
            {{- value -}}
        {%- elif value is string -%}
-            '{{- elementary.escape_special_chars(value) -}}'


inserts with timestamp strings should be casted explicitly in Athena. How to handle it properly?

I think that the clean thing would be to pass to this function the actual type of the column we are rendering to from insert_rows:

insert_rows already calls adapter.get_columns_in_relation so we can pass column.type as another parameter to render_value

We can normalize the DB data type with the elementary.normalize_data_type and then treat the value differently if it's a timestamp.

thanks @haritamar , looks really straightforward. Do you see any problem with using edr_cast_as_timestamp for timestamp literals? It's needed for Athena to add timestamp type but will also change timestamp literals for other connector. Or should I add a dedicated macro to handle timestamp literals which would be no-op for all but Athena?

artem-garmash · 2023-06-13T07:44:13Z

macros/utils/table_operations/merge_sql.sql

@@ -19,3 +19,29 @@
    {% do return(macro(target_relation, tmp_relation, unique_key, dest_columns)) %}
    {{ return(merge_sql) }}
 {% endmacro %}
+
+{% macro athena__merge_sql(target_relation, tmp_relation, unique_key, dest_columns, incremental_predicates) %}


Ideally, this should be done in dbt-athena

@artem-garmash currently we have https://github.com/dbt-athena/dbt-athena/blob/main/dbt/include/athena/macros/materializations/models/incremental/merge.sql#L50 - did you have a look?

sure, I was using it for reference. But that iceberg_merge is running merge query and here we need sql only. https://github.com/artem-garmash/dbt-data-reliability/blob/master/macros/utils/table_operations/merge_sql.sql#L7 refers to dbt.merge_sql and part of dbt (https://github.com/dbt-labs/dbt-core/blob/7eedfcd2742fcf789200a4b829c8c2eb98369089/core/dbt/include/global_project/macros/materializations/models/incremental/merge.sql#L4). Is it somehow available from dbt-athena?

It doesn't seems that we overwrite get_merge_sql, maybe in this case is fine to leave then athena__merge_sqlhere

artem-garmash · 2023-06-13T07:45:18Z

models/edr/data_monitoring/data_monitoring/data_monitoring_metrics.sql

@@ -4,8 +4,10 @@
    unique_key='id',
    on_schema_change='append_new_columns',
    full_refresh=elementary.get_config_var('elementary_full_refresh'),
-    meta={"timestamp_column": "updated_at"}
+    meta={"timestamp_column": "updated_at"},
+    table_type="iceberg",


Is there better way to provide connector specific config properties for some models?

Nothing special, I guess this should work:

table_type="iceberg" if target.type = 'athena' else none

Okey, I thought maybe there is some way to avoid having "table_type=None" for other connectors (e.g. to avoid possible conflicts). As an example, dbt_columns.sql has materialized=elementary.get_dbt_columns_materialization(), to have connector specific value, but here it should work for a set of properties, something like kwargs.

artem-garmash · 2023-06-13T09:18:50Z

models/edr/dbt_artifacts/dbt_columns.sql

@@ -1,7 +1,7 @@
 {{
  config(
    materialized = 'view',
-    enabled = elementary.get_config_var('enable_dbt_columns') and target.type != 'databricks' and target.type != 'spark' | as_bool()
+    enabled = elementary.get_config_var('enable_dbt_columns') and target.type != 'databricks' and target.type != 'spark' and target.type != 'athena' | as_bool()


this probably would work fine, need re-check it

nicor88 · 2023-08-22T10:20:38Z

macros/commands/generate_elementary_cli_profile.sql

+      aws_access_key_id: "<AWS_ACCESS_KEY_ID>"
+      aws_secret_access_key: "<AWS_SECRET_ACCESS_KEY>"


what if I run the CLI from a setup that require the usage of an AWS session token? e.g. I'm running the cli from a setup where I use AWS SSO - that might be revisited to support this cases.

What do you mean? It's just a copy of dbt-athena config from https://github.com/dbt-athena/dbt-athena#configuring-your-profile and dbt is used by the cli to query elementary tables. And the CLI is used same way as dbt with dbt-athena. E.g. if I'm using AWS SSO, aws_profile_name is used and the login is outside of the scope of dbt/elementary.

when working with AWS SSO setups sometimes an extra variable need to be passed if we get the keys from something like aws-vault the key is aws_session_token, and I was wondering if we need to include that.
But you are right in most of the case if the user uses aws sso login, using the aws profile is enough.

nicor88 · 2023-08-22T10:22:11Z

macros/edr/metadata_collection/get_columns_from_information_schema.sql

+        upper(column_name) as column_name,
+        data_type
+    from information_schema.columns


FYI: this query is going to be super slow - did you think about using a dbt-athena method to make that working?
e.g. we can add another method in the adapter to leverage glue apis - those are going to be few order of magnitude faster

Thanks for pointing this out. I noticed dbt-athena switched to glue apis some time ago. Definitely it should be checked, once the port is working properly overall.

the issue is that in order to use glue api you have to do an adapter.dispatch call from your macro, as it's not possible to make python invocations that are outside the scope of the adapter.
As said, we can expose all what we need in the adatper if necessary.

nicor88 · 2023-08-22T10:28:26Z

@artem-garmash great job
Few notes:

we can add what we can in https://github.com/dbt-athena/dbt-athena - happy to help you and review your PRs/
lalalilo/athena_utils was moved to https://github.com/dbt-athena/athena-utils consider to use that instead

I left some minor comments, overall looks great 💯

haritamar · 2023-08-30T16:22:27Z

models/edr/metadata_store/filtered_information_schema_columns.sql

@@ -1,6 +1,6 @@
 {{
  config(
-    materialized = 'view',
+    materialized = 'table' if target.type == 'athena' else 'view',


Why does it need to be a table in Athena?

information_schema can not be used from view in Athena (https://docs.aws.amazon.com/athena/latest/ug/querying-glue-catalog.html)

You cannot use CREATE VIEW to create a view on the information_schema database.

…easonality Moss adding hour of day and week seasonality

artem-garmash added 26 commits June 2, 2023 10:05

Athena complete_buckets_cte

b18f50b

Disable information_schema usage for Athena for now

112a31b

Views don't work against information_schema, maybe change them to tables

HACK: make timestamp insert work with Athena

959d3f9

TODO: handle properly to allow casts for insert expressions without quoting all strings

Use information_schema for Athena

995a80d

Athena current_timestamp macros

882a404

Athena timeadd

0ba5813

Athena hacky create_or_replace

2f77726

using drop_relation() doesn't work in this context as type is null use explicit drop table for now

Athena replace_table_data

fe7c02f

Athena escape_special_chars

8ac690f

Athena cli profile

9d71a72

Fix timestamp type for athena iceberg

b75c383

Athena target_database to fix log errors

a0be46a

EXPLICIT ICEBERG TYPE FOR NOW

c519d4f

Fix Athena bucket cte

8ca54bf

Athena current_timestamp_in_utc usable in plain tables

b3966c5

Make descriptions work in Athena, cast all to strings

150abb1

Typo in Athena timeadd macro

14a0bc1

Athena dirty merge_sql

bf4fc24

it should be done in athena-connector properly

Athena delete and insert

8851dd3

Adapt anomaly queries for Athena iceberg, cast timestamps

4ba7f2d

Athena datediff

1abc47b

Athena string type as varchar

53ba6fe

Athena iceberg specific timestamp cast

37ea579

Athen ts cast with ISO8601 format handling

8cbdccf

Update HACK for timestamp inserts

f90633f

Cast timestamps in alert tables for Athena

9987d93

artem-garmash commented Jun 13, 2023

View reviewed changes

artem-garmash mentioned this pull request Jun 13, 2023

[ELE-36] Athena integration elementary-data/elementary#77

Closed

nicor88 reviewed Aug 22, 2023

View reviewed changes

haritamar reviewed Aug 30, 2023

View reviewed changes

artem-garmash pushed a commit that referenced this pull request Aug 31, 2023

Merge pull request #1 from mossyyy/moss-adding-hour-of-day-and-week-s…

143e71c

…easonality Moss adding hour of day and week seasonality

artem-garmash changed the title ~~Athena support~~ Athena support POC Sep 6, 2023

artem-garmash mentioned this pull request Sep 6, 2023

Athena support #2

Draft

16 tasks

artem-garmash mentioned this pull request Oct 28, 2023

ELE-36: Athena integration elementary-data/dbt-data-reliability#597

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Athena support POC #1

Athena support POC #1

artem-garmash commented Jun 13, 2023 •

edited

Loading

artem-garmash Jun 13, 2023

artem-garmash Jun 13, 2023

artem-garmash Jun 13, 2023

artem-garmash Jun 13, 2023

haritamar Aug 30, 2023

artem-garmash Sep 5, 2023

artem-garmash Jun 13, 2023

nicor88 Aug 22, 2023 •

edited

Loading

artem-garmash Sep 5, 2023

nicor88 Sep 5, 2023

artem-garmash Jun 13, 2023

haritamar Aug 30, 2023

artem-garmash Sep 5, 2023

artem-garmash Jun 13, 2023

nicor88 Aug 22, 2023

artem-garmash Sep 5, 2023

nicor88 Sep 5, 2023

nicor88 Aug 22, 2023

artem-garmash Sep 5, 2023

nicor88 Sep 5, 2023

nicor88 commented Aug 22, 2023

haritamar Aug 30, 2023

artem-garmash Sep 5, 2023

		aws_access_key_id: "<AWS_ACCESS_KEY_ID>"
		aws_secret_access_key: "<AWS_SECRET_ACCESS_KEY>"

Athena support POC #1

Are you sure you want to change the base?

Athena support POC #1

Conversation

artem-garmash commented Jun 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicor88 Aug 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicor88 commented Aug 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artem-garmash commented Jun 13, 2023 •

edited

Loading

nicor88 Aug 22, 2023 •

edited

Loading