Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monkeypatch BiqQuery adapter to retrive SQL for async execution #1474

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

pankajkoti
Copy link
Contributor

@pankajkoti pankajkoti commented Jan 21, 2025

closes: #1260
related: #1261
related: #1266
closes: #1265

Copy link

netlify bot commented Jan 21, 2025

Deploy Preview for sunny-pastelito-5ecb04 canceled.

Name Link
🔨 Latest commit 379d997
🔍 Latest deploy log https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/67937281c7956200082519af

@pankajkoti pankajkoti changed the title Monkeypatch BiqQuery adapter for retriveing SQL for async execution Monkeypatch BiqQuery adapter to retrive SQL for async execution Jan 21, 2025
cosmos/operators/local.py Outdated Show resolved Hide resolved
cosmos/operators/local.py Outdated Show resolved Hide resolved
Copy link

cloudflare-workers-and-pages bot commented Jan 21, 2025

Deploying astronomer-cosmos with  Cloudflare Pages  Cloudflare Pages

Latest commit: 379d997
Status: ✅  Deploy successful!
Preview URL: https://c520c11a.astronomer-cosmos.pages.dev
Branch Preview URL: https://monkeypatch-bq-adapter.astronomer-cosmos.pages.dev

View logs

@@ -371,32 +370,6 @@ def generate_task_or_group(
return task_or_group


def _add_dbt_compile_task(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we agreed, I've added a task to follow up on this:
#1477

Ideally, we'd compare both approaches before 1.9, but I'm happy with us to go ahead with approach of #1474 in 1.9 - it is already a significant improvement when compared to 1.7.0, since it solves the existing SQL query issue.

) -> Tuple[BigQueryAdapterResponse, agate.Table]:
return BigQueryAdapterResponse("mock_bigquery_adapter_response"), empty_table()

BigQueryConnectionManager.execute = execute
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may have to mock a different method when using dbt build. That said, this is outside of the scope of fixing the original but this PR solves, so I've logged a follow up ticket:
#1476

Comment on lines 68 to 69
"gcp_project",
"dataset",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we remove these as templated fields?

@@ -60,13 +55,11 @@ class DbtSourceAirflowAsyncOperator(DbtBaseAirflowAsyncOperator, DbtSourceLocalO
pass


class DbtRunAirflowAsyncOperator(BigQueryInsertJobOperator): # type: ignore
class DbtRunAirflowAsyncOperator(BigQueryInsertJobOperator, DbtRunLocalOperator): # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if we can see the SQL statement in the operator Templated Field, like we can do with the Cosmos Local operators. If this is too much work, I'm happy for it to be done in a follow up PR/ticket.

Comment on lines 435 to 492
return_sql: bool = False,
sql_context: dict[str, Any] | None = None,
) -> FullOutputSubprocessResult | dbtRunnerResult | str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused with this method, given what the LocalOperatorrun_command does:

def run_command(
self,
cmd: list[str],
env: dict[str, str | bytes | os.PathLike[Any]],
context: Context,
) -> FullOutputSubprocessResult | dbtRunnerResult:

The LocalOperator currently:

  • runs the dbt command
  • runs the actual transformation in the data warehouse

I suggest one of the following is done:
a) have both methods accomplishing the same goal (not only running the dbt command, but actually executing it in the data warehouse) or
b) name this method something else

If we decide to do approach (a), I believe we should:

  • stick to the existing method interface, exposing it in the cosmos BaseOperator class
  • change this method in this PR to also execute the SQL statement

@@ -647,11 +669,15 @@ def get_openlineage_facets_on_complete(self, task_instance: TaskInstance) -> Ope
)

def build_and_run_cmd(
self, context: Context, cmd_flags: list[str] | None = None
self,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar feedback to #1474 (comment) here.

Concerns regarding:

  • inconsistent interface between different ExecutionMode implementations
  • expectation that this method would not only build the command with necessary flags, but actually have executed the dbt transformation by the end of its run

Copy link
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pankajkoti I'm very excited that we now have a more reliable way of calculating the full dbt SQL query. This approach fixes #1260 and solves many of the async tickets we have open.

Monkey-patching always carries a risk, but it is worth it at this stage.

It would be great if - either as part of this PR - or as a priority follow-up PR, we have an efficient way of testing that the monkey patching works in multiple versions of dbt, including the latest releases, and that the transformation is not being executed when we run the dbt command. I believe this must be done before we release this feature in 1.9.0

I've logged two follow-up tickets that are relevant:

It would be great if these could be accomplished before 1.9.0 release, but I'm also happy with us sticking to approach if time does not allow further analysis / implementation.

@pankajkoti
Copy link
Contributor Author

cc: @joppevos for visibility on the ongoing work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants