Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Glue and Athena reengineering #12410

Open
svdimchenko opened this issue Jan 21, 2025 · 0 comments
Open

AWS Glue and Athena reengineering #12410

svdimchenko opened this issue Jan 21, 2025 · 0 comments

Comments

@svdimchenko
Copy link
Contributor

Problem statement

This issue is follow up for following issue and PR (most probably can be closed)

There are 2 different platform in AWS to represent tables: Glue Catalog and Athena.
Glue catalog is similar to hive catalog and Athena is similar to Trino.

Currently Glue ingestor supports athena

VALID_PLATFORMS = [DEFAULT_PLATFORM, "athena"]
which is not correct due to the following issue: Athena has such concept as catalog which is used for cross-account data query, federated queries etc, however Glue Catalog is represented only once per AWS Account.
Because of the the correct table representation is:

  • for Glue: database.table
  • for Athena: catalog.schema.table
    but currently it's same for both engines.

There is another issue: for Athena platform Athena sql interface is used to retrieve tables metadata, which has low performance and may not contain full metadata about tables.

Keeping mentioned above, it would be great to implement following reengineering:

  • remove athena platform as possible option for glue ingestor
  • add database concept to athena ingestor, need to fix
    # In Athena the schema is the database and database is not existing
  • add possibility to retrieve tables metadata for athena using AWS Glue API when it's possible (see the following PR as an example)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant