Automating data updates without running queries within user session #1056
-
I am building a golem app that will be hosted on shiny server and the data supplying the app is expected to be updated daily. The data can be accessed from an SQL Database, but up to this point I have tried to avoid putting sql queries inside the app because (1) some of the queries take more than 1 minute to complete and (2) the data have somewhat demanding cleaning processes made on them before use in the app. I don't want the users to have to deal with waiting for the data to be queried and cleaned but am not sure how to automate data updates within the golem framework outside of the user session. Any tips on this? My original thoughts were to have a cron job that pulls, edits and replaces data (as a csv) used by the app but I can't get the app to use this data without having to reload and document the whole app package again. With a regular shiny app (that isn't a package) simply replacing the csv's within the app directory works just fine. Is there no solution for updating data regularly besides querying the data within the app? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
I had once a similar problem... but I think you kind of answered your question already. Simply do not make the cron job replace the So basically:
(by the way I think it is good, that you cannot simply penetrate with a shell-job from the outside into your app for security purposes if I understand correctly what you are saying :) ) As you already rightly said: The point is to avoid any unecessary processing at individual app instance level but carry that out at "the top level" instead or in any way before it is loaded into an App session. This could be beneficial whenever multiple users require the same updates/pre-processing and so this is carried out once (before) and not within each App (or R package for that matter) over and over again which would be just unnecessary work. This could also be beneficial for the overall user experience, as you suggested it avoids time laggs due to computations. Also, when the App pulls (instead of the cron jobs forcing updates from outside into the App) you can have finer control over what type of data is pulled by which user/App instance in the futurej; which can matter depending on your project and where the direction of development goes in the future. I think what you are writing is already kind of the solution, except for the intermediate place to store clean/updated data, but maybe I did not understand your question right |
Beta Was this translation helpful? Give feedback.
-
I’ll do my best though I am really not an expert and used the things below to get me somehow through some projects… Also it’s a bit difficult to give specific advice as the context of the project matters, but two things come to my mind. The first is informal and adhoc; the second preferable if your project demands some sort of professionalism (you have to comply to data sharing agreements and/or it is long term and a professional setup is a wise choice as you can better implement new features from a solid ground in the future). 1. Scenario:You use your own machine (literally your laptop) as “the server”. Query from the MySQL/Azure databases, run the cleaning/updates locally, store them in a new SQL database (or in a csv in Dropbox, which is really dodgy, but if you are allowed i.e. the data is not sensitive and is allowed to be stored locally why not). Finally make your App pull from the new SQL or from another place of your choice (Dropbox…) with light (computationally undemanding) SQL queries, as the computationally demanding tasks are already done on your laptop. Benefits:
Drawbacks:
2. ScenarioYou set up another “server” that does the data cleaning which is basically the same R code that you would run on your local machine, but now write a plumber API around it i.e. embed the R data cleaning code inside the plumber-framework https://www.rplumber.io/index.html`. You can trigger/call the computations/cleaning from outside, e.g. from your local machine or from your external shiny App via get-requests: the computations are done by the server, and the results/cleaned data are sent back from there into your App. There is still the part of how to get the “raw” data into your R/plumber-construction on that server: if you can make a cron job (either being run locally on your machine or on the server automatically) to send the data updates from Azure+MySQL to your plumber/R-sessions on the server regularly/scheduled, then you are done and this is what I meant by “running the cron job against the server instance”. You could have a cron job not running against the above setup, but alternatively making it move the data from Azure/MySQL to a new SQL database with some server side query/cleaning steps in between (if you are allowed to and the computations are not complicated). Your App then queries directly from that new SQL database (which has the cleaned data). |
Beta Was this translation helpful? Give feedback.
I’ll do my best though I am really not an expert and used the things below to get me somehow through some projects…
Also it’s a bit difficult to give specific advice as the context of the project matters, but two things come to my mind.
The first is informal and adhoc; the second preferable if your project demands some sort of professionalism (you have to comply to data sharing agreements and/or it is long term and a professional setup is a wise choice as you can better implement new features from a solid ground in the future).
1. Scenario:
You use your own machine (literally your laptop) as “the server”. Query from the MySQL/Azure databases, run the cleaning/updates locally, store them i…