-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for converting traditional hive tables to iceberg/delta/hudi #550
Comments
This seems like a pretty easy lift. There are a number of use cases where simply adding parquet files to the table would be handy. |
Yes this can be done, we need to implement a parquet source class which needs to do two things - retrieve snapshot and retrieve change log since lastSyncTime. Using List files
Using cloud notifications queue
The design is similar to what hudi does for ingesting large number of files, steps 7 and 8 in the architecture would become XTable sync. If you are using HDFS or object stores which don't support a queue based system for file notifications, we need to build/re-use existing queue implementation for file notifications. |
@djouallah @JDLongPLMR Let me know what you think of the two approaches, we can write this as utility tool in xtable-utilities similar to RunSync |
I believe you can covert parquet to hudi files via hudi bootstraping (https://hudi.apache.org/docs/migration_guide). Once it's in hudi, you can apache xtable to other formats. Onehouse can do this automatically. |
using listing files seems good for my use case |
@djouallah Yes listing will be sufficient for a small number of files, do you plan to submit a PR for this ? Let me know if you need any help regarding the PR. |
@vinishjail97 nah, I am just an end user of xtable :) |
okay, I will start a thread in dev mailing list to see if someone is interested to work on this feature. |
thanks all. Seems like a helpful addition |
Hi @vinishjail97 if it is not assigned to anyone yet would like to explore and take up this feature |
Yes @sudharshanraja-db you can pick up the first sub-task of file listing utility if you are interested, let me know what you think. The second sub-task is more open ended one and we can discuss the design in dev mailing list before finalizing the approach. |
Thanks @vinishjail97, like u suggested i will pick and start this feature so that i can understand the project and structure better ,then will look and discuss further about the design of second task |
For the first task, you can look at the RunSync class in xtable-utilities and explore other modules. |
Feature Request / Improvement
there are a lot of systems that produce parquet files only, it will be useful if xtable can convert those parquet to modern tables formats without rewriting data just by adding metadata continuously.
Delta do that already but it is a one off operation and can't accept new files
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: