-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qs file format as cache file format #304
Conversation
…e for qs files) RData: .RData and .hash qs: .qs and .qs.hash
Thanks @gisler ! I'll have time over the weekend to review in detail. Please don't think I'm ignoring this... |
No worries, we all have our duties, but I appreciate your message nonetheless. |
Thanks for this @gisler, And thanks for writing clear tests as well! I noticed that you are using ifelse statements for implementing the cashing strategy. This works if we never are going to support other caching formats other than qs. This might not be the case. What do you think about implementing a strategy pattern with a little bit of meta-programming? What I mean is using a pattern like is used for loading different data types. This is a simple list with the cache type prefix followed by a function that implements the caching strategy. For example, in
Then the code could read:
Of course I mean this a pseudo-code and it is not guaranteed to work, etc.... What do you think? |
Hi, you're welcome, glad you like it, @KentonWhite. And yes, I did include tests, although I must say that I more or less only copied and adapted existing tests. Anyway, while I very much like metaprogramming (I use it quite often in my It achives about what you suggest, but without the parsing of strings. If you like, I can change the statements so that they make use of this function for future reference. |
@gisler The reason I was thinking about meta programming was eliminating the need for someone to find all of the places where a cache file format is used. With meta programming adding a new format is as easy as changing the table with a new function.
I didn’t look for all of the places where there is a switch. If their is only 1 place in the code where this switch is made then the current way is fine. If there are multiple places meta programming might be easier maintenance.
Is there only the 1 place where a code switch is needed?
Kenton White
… On Nov 29, 2020, at 11:49 AM, Gerold Hepp ***@***.***> wrote:
Hi, you're welcome, glad you like it, @KentonWhite. And yes, I did include tests, although I must say that I more or less only copied and adapted existing tests.
Anyway, while I very much like metaprogramming (I use it quite often in my DTSg package), I don't think it is necessary here. One could either use an if ... else if ... else statement to include more formats in the future or probably more elegant the rather unknown switch() function: https://stat.ethz.ch/R-manual/R-devel/library/base/html/switch.html
It achives about what you suggest, but without the parsing of strings. If you like, I can change the statements so that they make use of this function for future reference.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Ah, I see. Well, there is one for saving, one for loading and one defining the file extensions. If the config equals the file extension (which it actually does as it is now), we can easily erase the need to intervene there. This is a good idea of yours. Erasing the rest is a bit more complicated though, as there are two expressions involved and for future formats usually also the requirement to load a package. It should be possible nonetheless using |
Hi @KentonWhite, I committed a rough implementation of your idea (at least I hope so) yesterday. It still requires some polishing and explanatory comments though: Lines 171 to 184 in 88343b8
So with this, cache file formats could in future be specified by a named list of named expressions. One just has to keep in mind to add potentially required packages to the suggested packages list. One drawback though is that the expressions used for saving and loading depend on objects living in the respective functions only. This requires careful explanation. |
Thanks @gisler! I'll take a more detailed look over the weekend. |
Perfect, I finished the polishing and explanations in the meantime. |
Sorry, had to "kick" in a final commit to this PR. The code should be "cleaner" now. |
@KentonWhite Oh, before I forget it and since you are preparing a release: https://github.com/KentonWhite/ProjectTemplate/blob/f0f07ec386c70674647b55773cf0974905dd7603/R/project.config.R is missing I don't want to include a fix in this PR and it's kinda not worth to open a separate Issue or PR for it, but I can do so in case you prefer. |
Thanks @gisler Looks good! One question and 1 comment before I merge in: Have you checked that migrate project works as expected? It should and is always nice to confirm 👍 Since I needed to bump the version to 0.9.3 to fix an error with the new R release, can you change the version to 0.10.0 please. |
Okay, no problem, @KentonWhite. I tested migrating the project and it works, although I wonder why it prints
And it made me add a |
By the way, I just did a few non-representative benchmarks on my machine:
I generally used default settings. File sizes are in the case of
Scaling
and like this for reading
File sizes scaled approximately linearly. Leaving one column out at a time, changes the timing, but not the ranking and adding a
Maybe it's different with less random data and non-defaults. I only tried out
Anyway, I think
P.S. I know, it's strange to do such tests once the work is done and not before it, but then, I knew EDIT: P.P.S. Did some more experimenting: The read and write functions from the |
The migrate test looks bizarre. I will try it on an existing project and see if I get the same issues. |
It looks good! merging after tests have passed. |
Seems like the function is printing if the cache directory is not existing, which was And cool, looking forward to make use of it in production. Hopefully, there are no bugs. |
Types of change
Pull request checklist
man
Proposed changes
As shortly discussed in #225, I implemented "qs" as a possible cache file format (https://CRAN.R-project.org/package=qs). The default stays the "RData" file format for now though.
Design considerations:
cache
function doesn't require a loaded project to work. I tried to keep it like this. This, however, has the drawback that using thecache
function without a loaded project might not consider possible config overrides..RData
files already exist and the file format is changed to.qs
, new.qs
files are created upon loading the project. This includes the.hash
files. The following list shows the file extensions of the different cache file formats:.RData
and.hash
.qs
and.qs.hash
The latter also implies that
clear.cache
only deletes the files of the format specified in the config and leaves the other files untouched (might need a test there, but first want to know your opinion about this).