Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slice_sample() does not interact correctly with set.seed() when working with an SQLite database. #1344

Closed
stephenashton-dhsc opened this issue Aug 7, 2023 · 4 comments

Comments

@stephenashton-dhsc
Copy link

I'm not too sure if this issue sits within dbplyr or RSQLite, but the slice_sample() function does not appear to be taking account of a seed set via set.seed() when using an SQLite database.

If I set a seed, then run slice_sample(), then later set the same seed, and run slice_sample() again, the sampled table is different.

For comparison, when undertaking these sample operations using a Microsoft SQL Server database, the expected behaviour is observed (i.e., the two sampled tables are identical).

library(DBI)
library(RSQLite)
library(dplyr)
library(dbplyr)


tbl <- tibble(n = 1:30)
conn <- dbConnect(SQLite(), ":memory:")
dbWriteTable(conn, "test_table", tbl)

set.seed(100)
sample1 <- slice_sample(tbl(conn, "test_table"), n = 3)

set.seed(100)
sample2 <- slice_sample(tbl(conn, "test_table"), n = 3)

identical(sample1, sample2)
@stephenashton-dhsc
Copy link
Author

If this is out of the control of R packages (for example, if the issue is with SQLite itself), would it be possible to return a warning to inform the user?

@mgirlich
Copy link
Collaborator

mgirlich commented Aug 7, 2023

set.seed() can't affect the database (how could it as it doesn't even have any argument related to the database). This is true for SQLite and SQL Server and any other database. If slice_sample() consistently samples the same rows for SQL Server this would be a bug.
A warning doesn't make much sense here and would mostly annoy users.

Sorry, but I can't help here.

@mgirlich mgirlich closed this as completed Aug 7, 2023
@krlmlr
Copy link
Member

krlmlr commented Aug 7, 2023

There are ways to make RAND() deterministic in SQL Server: https://database.guide/how-to-make-rand-deterministic-in-sql-server/ .

@nathanhaigh
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants