parquet file parallel read settings #13635
Unanswered
jiangjiguang
asked this question in
Q&A
Replies: 1 comment
-
We can't create multiple splits from a single row group in the current code. In theory it might be possible to divide a row group into smaller chunks by looking at column level page offsets, but it would significantly complicate the reader code. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a parquet file whose size is 251.86MB, the block is 128MB. so the file has two "Row group".
The problem is: no matter how much drivers I set. has two drivers to read the file, each driver reads one Row group.
I have read the code, find the problem below:
How can I set up multiple drivers read one row group?
Beta Was this translation helpful? Give feedback.
All reactions