-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUPPORT] Hoodie Custom Merge Paylod results in UnsupportedOperationException #12571
Comments
I will attempt to reproduce the issue internally and follow up with the next steps. |
The Spark job completes successfully with the default payload. I suspect the issue lies with the custom payload. Could you please review your code to identify the problem? |
@rangareddy - I'm running into the issue with the custom payload that I have linked in this ticket. Were you able to reproduce that on your end? Or are you saying that you generated the random data and then used the default payload to run the spark job? I'm seeking help as the custom payload is causing an exception and not the default one. |
@dataproblems just letting you know this is on our list.. will spare some bandwidth and repro this |
@xushiyan - thank you for letting me know! I'm looking forward to figuring out a resolution for this! |
Problem Description
I created a custom payload for merging records on my hudi table with record level index and then tried to upsert values in the table. The upsert operation was successful for a single record as well as batches of records less than 50k in size but as I tried with 50k records ( or more ) I ran into an UnsupportedOperationException ( stacktrace provided ).
How to reproduce the problem?
Step 1: Generate the data and hudi table
Case class that encapsulates my dataset
Spark code to generate sample dataframe
Step 2: Create hudi table from the random data
Step 3: Create a custom hudi payload class
Step 4: Create a data parcel for the upsert
Step 5: Perform the upsert operation
Expected Behavior
I expect the upsert operation to go through without any exceptions.
Environment Description
Hudi version : 0.15.0
Spark version : 3.4.1
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : No
Additional Context
This problem does not surface when I try 20k or 30k rows for the upsert but does surface when I go higher.
Stacktrace
The text was updated successfully, but these errors were encountered: