Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparable benchmarks: polars and duckplyr #2

Open
etiennebacher opened this issue Apr 18, 2024 · 3 comments
Open

Comparable benchmarks: polars and duckplyr #2

etiennebacher opened this issue Apr 18, 2024 · 3 comments

Comments

@etiennebacher
Copy link

Hi @jrosell, I have a question regarding the code you benchmark. You say in the README:

I've added a file copy and file reading steps in each benchmark method to be sure to compare the pipelines without caching and a maximum of 8 threads.

I don't see a problem with that, but in the code I see that file.copy() is only here for polars and tidypolars but not for the other methods.

1br/run.1e9.R

Lines 29 to 32 in 456afff

scan_polars_streaming = {
print("scan_polars_streaming")
file.copy(file_name, "measurements.csv", overwrite = TRUE)
df <- pl$scan_csv("measurements.csv")$

1br/run.1e9.R

Lines 44 to 47 in 456afff

scan_tidypolars_dplyr_streaming = {
print("scan_tidypolars_dplyr_streaming")
file.copy(file_name, "measurements.csv", overwrite = TRUE)
df <- pl$scan_csv("measurements.csv")

1br/run.1e9.R

Lines 60 to 62 in 456afff

duckplyr_df_from_csv = {
print("duckplyr_df_from_csv")
df <- duckplyr::duckplyr_df_from_csv(file_name) |>

So it seems to me that this inflates the time for those two packages only. Is this a mistake or am I missing something?

@jrosell
Copy link
Owner

jrosell commented Apr 18, 2024

It's not finished work and it's not right now. I want to add more methods to run.1e9.R file and I will fix that.

@etiennebacher
Copy link
Author

I see, thanks. Closing then

@jrosell
Copy link
Owner

jrosell commented May 16, 2024

Copy is now ok and I see that duckplyr is faster than polars in my computer, BUT I don't trust the thread setting. I left this open, waiting for more info from here tidyverse/duckplyr#165

@jrosell jrosell reopened this May 16, 2024
@jrosell jrosell changed the title Question on the file reading Comparable benchmarks: polars and duckplyr May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants