Skip to content

Commit

Permalink
Merge branch 'master' of github.com:fishtown-analytics/dbt-external-t…
Browse files Browse the repository at this point in the history
…ables into fix/snowpipe-copy-commit
  • Loading branch information
jtcohen6 committed May 25, 2021
2 parents 970452b + a3b5619 commit 5fdca89
Show file tree
Hide file tree
Showing 23 changed files with 178 additions and 111 deletions.
33 changes: 26 additions & 7 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@

version: 2.1

orbs:
azure-cli: circleci/[email protected]

jobs:

integration-redshift:
docker:
- image: circleci/python:3.6.3-stretch
- image: circleci/python:3.6.13-stretch
steps:
- checkout
- run:
Expand All @@ -16,7 +19,7 @@ jobs:

integration-snowflake:
docker:
- image: circleci/python:3.6.3-stretch
- image: circleci/python:3.6.13-stretch
steps:
- checkout
- run:
Expand All @@ -29,7 +32,7 @@ jobs:
environment:
BIGQUERY_SERVICE_KEY_PATH: "/home/circleci/bigquery-service-key.json"
docker:
- image: circleci/python:3.6.3-stretch
- image: circleci/python:3.6.13-stretch
steps:
- checkout
- run:
Expand Down Expand Up @@ -63,9 +66,23 @@ jobs:
- image: dataders/pyodbc:1.2
steps:
- checkout
- run: &gnupg2
name: az cli dep
command: apt-get install gnupg2 -y
- azure-cli/install
- azure-cli/login-with-service-principal: &azure-creds
azure-sp: DBT_AZURE_SP_NAME
azure-sp-password: DBT_AZURE_SP_SECRET
azure-sp-tenant: DBT_AZURE_TENANT
- run:
name: resume Synapse pool/db
command: az synapse sql pool resume --name $DBT_SYNAPSE_DB --workspace-name $DBT_SYNAPSE_SERVER --resource-group dbt-msft
- run:
name: "Run Tests - synapse"
command: ./run_test.sh synapse
- run:
name: pause Synapse pool/db
command: az synapse sql pool resume --name $DBT_SYNAPSE_DB --workspace-name $DBT_SYNAPSE_SERVER --resource-group dbt-msft
- store_artifacts:
path: ./logs

Expand All @@ -74,9 +91,9 @@ jobs:
- image: dataders/pyodbc:1.2
steps:
- checkout
- run:
name: "wait for Synapse tests to finish"
command: sleep 60
- run: *gnupg2
- azure-cli/install
- azure-cli/login-with-service-principal: *azure-creds
- run:
name: "Run Tests - azuresql"
command: ./run_test.sh azuresql
Expand All @@ -93,4 +110,6 @@ workflows:
- integration-bigquery
- integration-databricks
- integration-synapse
- integration-azuresql
- integration-azuresql:
requires:
- integration-synapse
2 changes: 1 addition & 1 deletion .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ Describe your changes, and why you're making them.
## Checklist
- [ ] I have verified that these changes work locally
- [ ] I have updated the README.md (if applicable)
- [ ] I have added tests & descriptions to my models (and macros if applicable)
- [ ] I have added an integration test for my fix/feature (if applicable)
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@

**/target/
**/dbt_modules/
**/logs/
**/env/
**/venv/
12 changes: 8 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# External sources in dbt

dbt v0.15.0 added support for an `external` property within `sources` that can include information about `location`, `partitions`, and other database-specific properties.
dbt v0.15.0 [added support](https://github.com/fishtown-analytics/dbt/pull/1784) for an `external` property within `sources` that can include information about `location`, `partitions`, and other database-specific properties.

This package provides:
* Macros to create/replace external tables and refresh their partitions, using the metadata provided in your `.yml` file source definitions
Expand All @@ -17,6 +17,10 @@ This package provides:

![sample docs](etc/sample_docs.png)

## Installation

Follow the instructions at [hub.getdbt.com](https://hub.getdbt.com/fishtown-analytics/dbt_external_tables/latest/) on how to modify your `packages.yml` and run `dbt deps`.

## Syntax

The `stage_external_sources` macro is the primary point of entry when using this package. It has two operational modes: standard and "full refresh."
Expand All @@ -26,18 +30,18 @@ The `stage_external_sources` macro is the primary point of entry when using this
$ dbt run-operation stage_external_sources

# iterate through all source nodes, create or replace (+ refresh if necessary)
$ dbt run-operation stage_external_sources --vars 'ext_full_refresh: true'
$ dbt run-operation stage_external_sources --vars "ext_full_refresh: true"
```

The `stage_external_sources` macro accepts a limited node selection syntax similar to
[snapshotting source freshness](https://docs.getdbt.com/docs/running-a-dbt-project/command-line-interface/source/#specifying-sources-to-snapshot):

```bash
# stage all Snowplow and Logs external sources:
$ dbt run-operation stage_external_sources --args 'select: snowplow logs'
$ dbt run-operation stage_external_sources --args "select: snowplow logs"

# stage a particular external source table:
$ dbt run-operation stage_external_sources --args 'select: snowplow.event'
$ dbt run-operation stage_external_sources --args "select: snowplow.event"
```

## Setup
Expand Down
5 changes: 0 additions & 5 deletions integration_tests/.gitignore

This file was deleted.

14 changes: 4 additions & 10 deletions integration_tests/ci/sample.profiles.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,16 +50,13 @@ integration_tests:
schema: dbt_external_tables_integration_tests_databricks

synapse:
type: sqlserver
type: synapse
driver: "ODBC Driver 17 for SQL Server"
port: 1433
host: "{{ env_var('DBT_SYNAPSE_SERVER') }}"
host: "{{ env_var('DBT_SYNAPSE_SERVER') }}.sql.azuresynapse.net"
database: "{{ env_var('DBT_SYNAPSE_DB') }}"
username: "{{ env_var('DBT_SYNAPSE_UID') }}"
password: "{{ env_var('DBT_SYNAPSE_PWD') }}"
authentication: CLI
schema: dbt_external_tables_integration_tests_synapse
encrypt: 'yes'
trust_cert: 'yes'
threads: 1

azuresql:
Expand All @@ -68,9 +65,6 @@ integration_tests:
port: 1433
host: "{{ env_var('DBT_AZURESQL_SERVER') }}"
database: "{{ env_var('DBT_AZURESQL_DB') }}"
username: "{{ env_var('DBT_AZURESQL_UID') }}"
password: "{{ env_var('DBT_AZURESQL_PWD') }}"
authentication: CLI
schema: dbt_external_tables_integration_tests_azuresql
encrypt: yes
trust_cert: yes
threads: 1
16 changes: 7 additions & 9 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@ version: '1.0'

profile: 'integration_tests'

# require-dbt-version: inherit this from package

config-version: 2

source-paths: ["models"]
Expand All @@ -14,14 +12,17 @@ test-paths: ["tests"]
data-paths: ["data"]
macro-paths: ["macros"]

target-path: "target" # directory which will store compiled SQL files
clean-targets: # directories to be removed by `dbt clean`
target-path: "target"
clean-targets:
- "target"
- "dbt_modules"

vars:
dbt_external_tables_dispatch_list: ['dbt_external_tables_integration_tests']

seeds:
+quote_columns: false

sources:
dbt_external_tables_integration_tests:
plugins:
Expand All @@ -34,9 +35,6 @@ sources:
spark_external:
+enabled: "{{ target.type == 'spark' }}"
synapse_external:
+enabled: "{{ target.name == 'synapse' }}"
+enabled: "{{ target.type == 'synapse' }}"
azuresql_external:
+enabled: "{{ target.name == 'azuresql' }}"

seeds:
quote_columns: false
+enabled: "{{ target.type == 'sqlserver' }}"
12 changes: 8 additions & 4 deletions integration_tests/macros/plugins/sqlserver/prep_external.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

{% set external_data_source = target.schema ~ '.dbt_external_tables_testing' %}

{% if target.name == "synapse"%}
{% if target.type == "synapse"%}

{% set create_external_data_source %}
IF NOT EXISTS ( SELECT * FROM sys.external_data_sources WHERE name = '{{external_data_source}}' )
Expand All @@ -29,7 +29,7 @@
)
{% endset %}

{% elif target.name == "azuresql" %}
{% elif target.type == "sqlserver" %}

{% set cred_name = 'synapse_reader' %}

Expand All @@ -55,17 +55,21 @@
{%- endif %}


{% if target.name == "azuresql" -%}
{% if target.type == "sqlserver" -%}
{% do log('Creating database scoped credential ' ~ cred_name, info = true) %}
{% do run_query(create_database_scoped_credential) %}
{%- endif %}

{% do log('Creating external data source ' ~ external_data_source, info = true) %}
{% do run_query(create_external_data_source) %}

{% if target.name == "synapse" -%}
{% if target.type == "synapse" -%}
{% do log('Creating external file format ' ~ external_file_format, info = true) %}
{% do run_query(create_external_file_format) %}
{%- endif %}

{% endmacro %}

{% macro synapse__prep_external() %}
{% do return( dbt_external_tables_integration_tests.sqlserver__prep_external()) %}
{% endmacro %}
Original file line number Diff line number Diff line change
@@ -1,21 +1,15 @@
version: 2

sources:
- name: synapse_external
- name: azuresql_external
schema: "{{ target.schema }}"
loader: ADLSblob

loader: RDBMS cross database query
tables:

- name: people_csv_unpartitioned
external: &csv-people
location: '/csv'
file_format: "{{ target.schema ~ '.dbt_external_ff_testing' }}"
external:
data_source: "{{ target.schema ~ '.dbt_external_tables_testing' }}"
reject_type: VALUE
reject_value: 0
ansi_nulls: true
quoted_identifier: true
schema_name: 'dbt_external_tables_integration_tests_synapse'
object_name: 'people_csv_unpartitioned'
columns: &cols-of-the-people
- name: id
data_type: int
Expand All @@ -34,28 +28,6 @@ sources:
- last_name
- email

- name: people_csv_partitioned
external:
<<: *csv-people
# TODO: SYNAPSE DOES NOT DO PARTITIONS
# (BUT WE COULD MAKE A WORKAROUND !!!)
# partitions: &parts-of-the-people
# - name: section
# data_type: varchar
columns: *cols-of-the-people
tests: *equal-to-the-people
- name: azuresql_external
schema: "{{ target.schema }}"
loader: RDBMS cross database query
tables:
- name: people_csv_unpartitioned
external:
data_source: "{{ target.schema ~ '.dbt_external_tables_testing' }}"
schema_name: 'dbt_external_tables_integration_tests_synapse'
object_name: 'people_csv_unpartitioned'
columns: *cols-of-the-people
tests: *equal-to-the-people

# TODO: JSON IS NOT SUPPORTED BY SYNAPSE ATM

# - name: people_json_unpartitioned
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,12 +71,20 @@ sources:

- name: people_csv_partitioned_no_columns
external:
<<: *json-people
<<: *csv-people
partitions: *parts-of-the-people
tests: *same-rowcount

- name: people_csv_with_keyword_colname
external: *csv-people
columns:
- name: UNION
quote: true
data_type: varchar(64)
tests: *same-rowcount

- name: people_json_unpartitioned_no_columns
external: *csv-people
external: *json-people
tests: *same-rowcount

- name: people_json_partitioned_no_columns
Expand Down
File renamed without changes.
46 changes: 46 additions & 0 deletions integration_tests/models/plugins/synapse_external.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
version: 2

sources:
- name: synapse_external
schema: "{{ target.schema }}"
loader: ADLSblob

tables:

- name: people_csv_unpartitioned
external: &csv-people
location: '/csv'
file_format: "{{ target.schema ~ '.dbt_external_ff_testing' }}"
data_source: "{{ target.schema ~ '.dbt_external_tables_testing' }}"
reject_type: VALUE
reject_value: 0
ansi_nulls: true
quoted_identifier: true
columns: &cols-of-the-people
- name: id
data_type: int
- name: first_name
data_type: varchar(64)
- name: last_name
data_type: varchar(64)
- name: email
data_type: varchar(64)
tests: &equal-to-the-people
- dbt_external_tables_integration_tests.tsql_equality:
compare_model: ref('people')
compare_columns:
- id
- first_name
- last_name
- email

- name: people_csv_partitioned
external:
<<: *csv-people
# TODO: SYNAPSE DOES NOT DO PARTITIONS
# (BUT WE COULD MAKE A WORKAROUND !!!)
# partitions: &parts-of-the-people
# - name: section
# data_type: varchar
columns: *cols-of-the-people
tests: *equal-to-the-people
4 changes: 4 additions & 0 deletions macros/common/stage_external_sources.sql
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@
{% endif %}

{% endfor %}

{% if sources_to_stage|length == 0 %}
{% do dbt_utils.log_info('No external sources selected') %}
{% endif %}

{% for node in sources_to_stage %}

Expand Down
Loading

0 comments on commit 5fdca89

Please sign in to comment.