You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is follow up for following issue and PR (most probably can be closed)
There are 2 different platform in AWS to represent tables: Glue Catalog and Athena.
Glue catalog is similar to hive catalog and Athena is similar to Trino.
which is not correct due to the following issue: Athena has such concept as catalog which is used for cross-account data query, federated queries etc, however Glue Catalog is represented only once per AWS Account.
Because of the the correct table representation is:
for Glue: database.table
for Athena: catalog.schema.table
but currently it's same for both engines.
There is another issue: for Athena platform Athena sql interface is used to retrieve tables metadata, which has low performance and may not contain full metadata about tables.
Keeping mentioned above, it would be great to implement following reengineering:
remove athena platform as possible option for glue ingestor
add database concept to athena ingestor, need to fix
Problem statement
This issue is follow up for following issue and PR (most probably can be closed)
There are 2 different platform in AWS to represent tables: Glue Catalog and Athena.
Glue catalog is similar to hive catalog and Athena is similar to Trino.
Currently Glue ingestor supports
athena
datahub/metadata-ingestion/src/datahub/ingestion/source/aws/glue.py
Line 120 in 4b79e75
catalog
which is used for cross-account data query, federated queries etc, however Glue Catalog is represented only once per AWS Account.Because of the the correct table representation is:
database
.table
catalog
.schema
.table
but currently it's same for both engines.
There is another issue: for Athena platform Athena sql interface is used to retrieve tables metadata, which has low performance and may not contain full metadata about tables.
Keeping mentioned above, it would be great to implement following reengineering:
athena
platform as possible option forglue
ingestordatabase
concept toathena
ingestor, need to fixdatahub/metadata-ingestion/src/datahub/ingestion/source/sql/athena.py
Line 416 in 4b79e75
The text was updated successfully, but these errors were encountered: