Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] VariableWidthViewVectorBenchmarks #41

Open
ViggoC opened this issue Nov 5, 2024 · 22 comments · May be fixed by #557
Open

[Java] VariableWidthViewVectorBenchmarks #41

ViggoC opened this issue Nov 5, 2024 · 22 comments · May be fixed by #557
Assignees
Labels
Type: enhancement New feature or request

Comments

@ViggoC
Copy link
Contributor

ViggoC commented Nov 5, 2024

The performance of VariableWidthViewVector is much slower than VariableWidthVector in some cases. setZero for overwriting is the bottleneck.

VariableWidthViewVectorBenchmarks.setSafeFromArray       1  avgt    5  0.088 ± 0.002  ms/op
VariableWidthViewVectorBenchmarks.setSafeFromArray       2  avgt    5  0.095 ± 0.020  ms/op
VariableWidthViewVectorBenchmarks.setSafeFromArray      10  avgt    5  0.091 ± 0.012  ms/op
VariableWidthViewVectorBenchmarks.setSafeFromArray      40  avgt    5  0.087 ± 0.001  ms/op
    VariableWidthVectorBenchmarks.setSafeFromArray       1  avgt    5  0.010 ±  0.001  ms/op
    VariableWidthVectorBenchmarks.setSafeFromArray       2  avgt    5  0.013 ±  0.001  ms/op
    VariableWidthVectorBenchmarks.setSafeFromArray      10  avgt    5  0.032 ±  0.001  ms/op
    VariableWidthVectorBenchmarks.setSafeFromArray      40  avgt    5  0.105 ±  0.002  ms/op

image

Environment: 2.6 GHz Intel Core i7, MacOS 13.6, openjdk version "13.0.2"
See code of benchmark in apache/arrow#44634.

Component(s)

Java

@ViggoC ViggoC changed the title VariableWidthViewVectorBenchmarks [Java] VariableWidthViewVectorBenchmarks Nov 5, 2024
@lidavidm
Copy link
Member

lidavidm commented Nov 5, 2024

Thanks for digging into this @ViggoC! I think we can take out the setZero since unreferenced data is unspecified

@ViggoC
Copy link
Contributor Author

ViggoC commented Nov 5, 2024

@lidavidm So we can just remove it without side effort?

@lidavidm
Copy link
Member

lidavidm commented Nov 5, 2024

What do you mean side effort?

@vibhatha
Copy link
Contributor

vibhatha commented Nov 5, 2024

Is it side effect?

@lidavidm
Copy link
Member

lidavidm commented Nov 5, 2024

Oh, good point.

Technically there would be a side effect in that data could be left over in the buffer, and then make it into IPC, but that is the tradeoff. (I believe it's already possible if say you create a string vector, add data, and later set the row count to something lower than before - the extra data won't be cleared.)

@vibhatha
Copy link
Contributor

vibhatha commented Nov 5, 2024

Technically there would be a side effect in that data could be left over in the buffer, and then make it into IPC, but that is the tradeoff. (I believe it's already possible if say you create a string vector, add data, and later set the row count to something lower than before - the extra data won't be cleared.)

Yeah that make sense.

And @ViggoC thanks for looking into this, I am looking into the PR as well. This is much needed to improve the initial version.

@ViggoC
Copy link
Contributor Author

ViggoC commented Nov 5, 2024

@lidavidm sorry, my bad, I mean side effect.
@vibhatha so you'll handle this performance issue?

@ViggoC
Copy link
Contributor Author

ViggoC commented Nov 5, 2024

@vibhatha And I also see that VariableWidthViewVector allocated much more memory than VariableWidthVector in some cases. It runs out of memory(8G) before I changed the allocator.close to iteration level.

@lidavidm
Copy link
Member

lidavidm commented Nov 5, 2024

Hmm, I think we still allocate power-of-two buffers for each of the data buffers, probably we should instead round to some multiple of say 64K or 1 MB or something instead. The implementation is still new so we are working out how best to treat it.

@vibhatha
Copy link
Contributor

vibhatha commented Nov 5, 2024

@ViggoC technically the answer is yes, I can do it, but in a practical sense my phase would be significantly slower than earlier because the time I could allocate would be significantly smaller. If this needs fixing sooner, the best I can do is help to figure out things to do and definitely review the PR if you could work on this feature.

@ViggoC
Copy link
Contributor Author

ViggoC commented Nov 5, 2024

IMHO, The only side effect of removing viewBuffer.setZero is that padding data is no longer 0, Right?

  /*
   * Variable Width View Vector comprises the following format
   *
   * Short strings, length <= 12
   * | Bytes 0-3  | Bytes 4-15                            |
   * |------------|---------------------------------------|
   * | length     | data (padded with 0)                  |
   * |------------|---------------------------------------|
   *
   * Long strings, length > 12
   * | Bytes 0-3  | Bytes 4-7  | Bytes 8-11 | Bytes 12-15 |
   * |------------|------------|------------|-------------|
   * | length     | prefix     | buf.index  | offset      |
   * |------------|------------|------------|-------------|
   *
   * */

@lidavidm
Copy link
Member

lidavidm commented Nov 5, 2024

ah...are the requirements for variable width vector different? if they must be 0-padded then we can't do anything here.

@lidavidm
Copy link
Member

lidavidm commented Nov 5, 2024

No, that diagram only applies to short strings. It's not specified what happens to long strings.

@vibhatha
Copy link
Contributor

vibhatha commented Nov 5, 2024

It would be better to write some test cases and evaluate this.

@ViggoC
Copy link
Contributor Author

ViggoC commented Nov 5, 2024

if they must be 0-padded then we can't do anything here.

"Any remaining bytes after the string itself are padded with 0."
https://arrow.apache.org/docs/format/Columnar.html#:~:text=Any%20remaining%20bytes%20after%20the%20string%20itself%20are%20padded%20with%200.

@ViggoC
Copy link
Contributor Author

ViggoC commented Nov 5, 2024

No, that diagram only applies to short strings. It's not specified what happens to long strings.

yeah, when i mentioned padding, i mean short string, there is no padding data for long string. for long string, every bytes in viewbuffer will be overwritten. so at least we can skip viewbuffer.setZero for long string.

@ViggoC
Copy link
Contributor Author

ViggoC commented Nov 5, 2024

commit 30e595411e37b7ca2739c7005442ed7b329eeaea. the benchmark only involves long string, so it shows a big improvement.

Benchmark                                           (step)  Mode  Cnt  Score   Error  Units
VariableWidthViewVectorBenchmarks.setSafeFromArray       1  avgt    5  0.024 ± 0.002  ms/op
VariableWidthViewVectorBenchmarks.setSafeFromArray       2  avgt    5  0.024 ± 0.002  ms/op
VariableWidthViewVectorBenchmarks.setSafeFromArray      10  avgt    5  0.024 ± 0.003  ms/op
VariableWidthViewVectorBenchmarks.setSafeFromArray      40  avgt    5  0.024 ± 0.008  ms/op

image

@assignUser assignUser transferred this issue from apache/arrow Nov 26, 2024
@assignUser assignUser added the Type: enhancement New feature or request label Nov 26, 2024
@ViggoC
Copy link
Contributor Author

ViggoC commented Jan 23, 2025

Here is the result of VariableWidth(View)VectorInlineValueBenchmarks
before

Benchmark                                                  (step)  Mode  Cnt  Score    Error  Units
VariableWidthViewVectorInlineValueBenchmarks.setSafeFromArray       1  avgt    5  0.017 ±  0.001  ms/op
VariableWidthViewVectorInlineValueBenchmarks.setSafeFromArray       2  avgt    5  0.018 ±  0.001  ms/op
VariableWidthViewVectorInlineValueBenchmarks.setSafeFromArray      10  avgt    5  0.017 ±  0.001  ms/op
VariableWidthViewVectorInlineValueBenchmarks.setSafeFromArray      40  avgt    5  0.017 ±  0.001  ms/op
VariableWidthVectorInlineValueBenchmarks.setSafeFromArray       1  avgt    5  0.003 ±  0.001  ms/op
VariableWidthVectorInlineValueBenchmarks.setSafeFromArray       2  avgt    5  0.005 ±  0.001  ms/op
VariableWidthVectorInlineValueBenchmarks.setSafeFromArray      10  avgt    5  0.013 ±  0.001  ms/op
VariableWidthVectorInlineValueBenchmarks.setSafeFromArray      40  avgt    5  0.044 ±  0.001  ms/op

after comment out setZero

VariableWidthViewVectorInlineValueBenchmarks.setSafeFromArray       1  avgt    5  0.002 ±  0.001  ms/op
VariableWidthViewVectorInlineValueBenchmarks.setSafeFromArray       2  avgt    5  0.002 ±  0.001  ms/op
VariableWidthViewVectorInlineValueBenchmarks.setSafeFromArray      10  avgt    5  0.002 ±  0.001  ms/op
VariableWidthViewVectorInlineValueBenchmarks.setSafeFromArray      40  avgt    5  0.003 ±  0.001  ms/op

Checking first and then deciding whether to skip setZero, as done in the commit 30e595411e37b7ca2739c7005442ed7b329eeaea, does not work in the benchmark because the vector is always reused.

@ViggoC
Copy link
Contributor Author

ViggoC commented Jan 23, 2025

After change the setup level and call zeroVector in prepare (see 374357d), the check and set works for step 1 and 2. For step 10 and 40 cases, the performance decrease a lot because we run out of the pre allocated memory, setZero is required for the reallocated memory.

VariableWidthViewVectorInlineValueBenchmarks.setSafeFromArray       1  avgt    5  0.004 ±  0.001  ms/op
VariableWidthViewVectorInlineValueBenchmarks.setSafeFromArray       2  avgt    5  0.004 ±  0.001  ms/op
VariableWidthViewVectorInlineValueBenchmarks.setSafeFromArray      10  avgt    5  0.022 ±  0.004  ms/op
VariableWidthViewVectorInlineValueBenchmarks.setSafeFromArray      40  avgt    5  0.032 ±  0.002  ms/op

Flame graph for step 1 / 2
Image

Flame graph for step 10 / 40

Image

@ViggoC
Copy link
Contributor Author

ViggoC commented Jan 23, 2025

@lidavidm Combined with memory pre-allocation, the solution in PR #557 effectively reduces the performance overhead of setZero for both long and short values.

@lidavidm
Copy link
Member

Thanks for investigating this!

lidavidm pushed a commit that referenced this issue Jan 24, 2025
@vibhatha
Copy link
Contributor

@ViggoC Thanks a lot. This is awesome work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants