Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-802] [Bug] Unable to read Iceberg tables when using session connection #490

Open
2 tasks done
joleyjol opened this issue Aug 14, 2023 · 4 comments
Open
2 tasks done
Labels
feature:iceberg Issues related to Iceberg support pkg:dbt-spark Issue affects dbt-spark Stale Mark an issue or PR as stale, to be closed triage:product In Product's queue type:bug Something isn't working as documented

Comments

@joleyjol
Copy link

Is this a new bug in dbt-spark?

  • I believe this is a new bug in dbt-spark
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

A dbt-spark project using the session connection method is unable to read Iceberg tables from Glue catalog due to a pyspark.sql.utils.AnalysisException with desc = SHOW TABLE EXTENDED is not supported for v2 tables .

I did some digging and I think the issue is related to the exception_handler in connections.py

In particular, this block:

        except Exception as exc:
            logger.debug("Error while running:\n{}".format(sql))
            logger.debug(exc)
            if len(exc.args) == 0:
                raise

I've verified that my job is hitting the len(exc.args) == 0 condition, probably because I'm using the session connection method, but I haven't verified that.

I was able to work around this error in my local environment by raising a DbtRuntimeError with the desc from the orginal exception, instead of just re-raising the original exception itself.

Is there any reason this method should ever re-raise the original error instead of a DbtRuntimeError?

Expected Behavior

The pyspark.sql.utils.AnalysisException should have been wrapped in a DbtRuntimeError, and thus handled by the existing logic that checks for this specific error message to deal with Iceberg table metadata properly.

Steps To Reproduce

  1. Run dbt-spark in a project configured with the session connection method
  2. Run a model that reads an Iceberg table from Glue
  3. Observe that the run fails due to a pyspark.sql.utils.AnalysisException

Relevant log output

20:04:36.994919 [error] [MainThread]: Encountered an error:
SHOW TABLE EXTENDED is not supported for v2 tables.;
ShowTableExtended *, [namespace#21, tableName#22, isTemporary#23, information#24]
+- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@50b0402d, [dbt_iceberg_db]
20:04:37.002637 [error] [MainThread]: Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/dbt/cli/requires.py", line 87, in wrapper
    result, success = func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/cli/requires.py", line 72, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/cli/requires.py", line 143, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/cli/requires.py", line 172, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/cli/requires.py", line 219, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/cli/requires.py", line 259, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/cli/main.py", line 278, in docs_generate
    results = task.run()
  File "/opt/conda/lib/python3.10/site-packages/dbt/task/generate.py", line 206, in run
    compile_results = CompileTask.run(self)
  File "/opt/conda/lib/python3.10/site-packages/dbt/task/runnable.py", line 468, in run
    result = self.execute_with_hooks(selected_uids)
  File "/opt/conda/lib/python3.10/site-packages/dbt/task/runnable.py", line 428, in execute_with_hooks
    self.before_run(adapter, selected_uids)
  File "/opt/conda/lib/python3.10/site-packages/dbt/task/runnable.py", line 415, in before_run
    self.populate_adapter_cache(adapter)
  File "/opt/conda/lib/python3.10/site-packages/dbt/task/runnable.py", line 406, in populate_adapter_cache
    adapter.set_relations_cache(self.manifest)
  File "/opt/conda/lib/python3.10/site-packages/dbt/adapters/base/impl.py", line 473, in set_relations_cache
    self._relations_cache_for_schemas(manifest, required_schemas)
  File "/opt/conda/lib/python3.10/site-packages/dbt/adapters/base/impl.py", line 450, in _relations_cache_for_schemas
    for relation in future.result():
  File "/opt/conda/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/opt/conda/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/opt/conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/utils.py", line 465, in connected
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/adapters/spark/impl.py", line 213, in list_relations_without_caching
    show_table_extended_rows = self.execute_macro(LIST_RELATIONS_MACRO_NAME, kwargs=kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/adapters/base/impl.py", line 1054, in execute_macro
    result = macro_function(**kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/clients/jinja.py", line 330, in __call__
    return self.call_macro(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/clients/jinja.py", line 257, in call_macro
    return macro(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/jinja2/runtime.py", line 763, in __call__
    return self._invoke(arguments, autoescape)
  File "/opt/conda/lib/python3.10/site-packages/jinja2/runtime.py", line 777, in _invoke
    rv = self._func(*arguments)
  File "<template>", line 21, in macro
  File "/opt/conda/lib/python3.10/site-packages/jinja2/sandbox.py", line 393, in call
    return __context.call(__obj, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/jinja2/runtime.py", line 298, in call
    return __obj(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/clients/jinja.py", line 330, in __call__
    return self.call_macro(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/clients/jinja.py", line 257, in call_macro
    return macro(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/jinja2/runtime.py", line 763, in __call__
    return self._invoke(arguments, autoescape)
  File "/opt/conda/lib/python3.10/site-packages/jinja2/runtime.py", line 777, in _invoke
    rv = self._func(*arguments)
  File "<template>", line 33, in macro
  File "/opt/conda/lib/python3.10/site-packages/jinja2/sandbox.py", line 393, in call
    return __context.call(__obj, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/jinja2/runtime.py", line 298, in call
    return __obj(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/clients/jinja.py", line 330, in __call__
    return self.call_macro(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/clients/jinja.py", line 257, in call_macro
    return macro(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/jinja2/runtime.py", line 763, in __call__
    return self._invoke(arguments, autoescape)
  File "/opt/conda/lib/python3.10/site-packages/jinja2/runtime.py", line 777, in _invoke
    rv = self._func(*arguments)
  File "<template>", line 52, in macro
  File "/opt/conda/lib/python3.10/site-packages/jinja2/sandbox.py", line 393, in call
    return __context.call(__obj, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/jinja2/runtime.py", line 298, in call
    return __obj(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/adapters/base/impl.py", line 290, in execute
    return self.connections.execute(sql=sql, auto_begin=auto_begin, fetch=fetch, limit=limit)
  File "/opt/conda/lib/python3.10/site-packages/dbt/adapters/sql/connections.py", line 146, in execute
    _, cursor = self.add_query(sql, auto_begin)
  File "/opt/conda/lib/python3.10/site-packages/dbt/adapters/sql/connections.py", line 80, in add_query
    cursor.execute(sql, bindings)
  File "/opt/conda/lib/python3.10/site-packages/dbt/adapters/spark/session.py", line 208, in execute
    self._cursor.execute(sql)
  File "/opt/conda/lib/python3.10/site-packages/dbt/adapters/spark/session.py", line 110, in execute
    self._df = spark_session.sql(sql)
  File "/opt/conda/lib/python3.10/site-packages/pyspark/sql/session.py", line 1034, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self)
  File "/opt/conda/lib/python3.10/site-packages/py4j/java_gateway.py", line 1321, in __call__
    return_value = get_return_value(
  File "/opt/conda/lib/python3.10/site-packages/pyspark/sql/utils.py", line 196, in deco
    raise converted from None
pyspark.sql.utils.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables.;
ShowTableExtended *, [namespace#21, tableName#22, isTemporary#23, information#24]
+- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@50b0402d, [dbt_iceberg_db]

Environment

- OS: Ubuntu 22.04.1 LTS
- Python: 3.10.8
- dbt-core: 1.6.0
- dbt-spark: 1.6.0

Additional Context

No response

@joleyjol joleyjol added type:bug Something isn't working as documented triage:product In Product's queue labels Aug 14, 2023
@github-actions github-actions bot changed the title [Bug] Unable to read Iceberg tables when using session connection [ADAP-802] [Bug] Unable to read Iceberg tables when using session connection Aug 14, 2023
@ben-schreiber
Copy link
Contributor

@joleyjol this looks similar to dbt-labs/dbt-spark#837 , does the fix there solve this issue as well?

@joleyjol
Copy link
Author

It looks like this should resolve my issue as well, thanks

@tanweipeng
Copy link

@joleyjol , so the issue that you raised is to wrap exception into DbtRuntimeError but not on the SHOW TABLE EXTENDED is not supported for v2 tables, right?

@dbeatty10 dbeatty10 added the feature:iceberg Issues related to Iceberg support label Feb 7, 2024
Copy link
Contributor

github-actions bot commented Aug 6, 2024

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale Mark an issue or PR as stale, to be closed label Aug 6, 2024
@mikealfare mikealfare added the pkg:dbt-spark Issue affects dbt-spark label Jan 13, 2025
@mikealfare mikealfare transferred this issue from dbt-labs/dbt-spark Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:iceberg Issues related to Iceberg support pkg:dbt-spark Issue affects dbt-spark Stale Mark an issue or PR as stale, to be closed triage:product In Product's queue type:bug Something isn't working as documented
Projects
None yet
Development

No branches or pull requests

5 participants