Automating data updates without running queries within user session #1056

22ermiller · 2023-06-13T00:54:16Z

22ermiller
Jun 13, 2023

I am building a golem app that will be hosted on shiny server and the data supplying the app is expected to be updated daily. The data can be accessed from an SQL Database, but up to this point I have tried to avoid putting sql queries inside the app because (1) some of the queries take more than 1 minute to complete and (2) the data have somewhat demanding cleaning processes made on them before use in the app. I don't want the users to have to deal with waiting for the data to be queried and cleaned but am not sure how to automate data updates within the golem framework outside of the user session. Any tips on this? My original thoughts were to have a cron job that pulls, edits and replaces data (as a csv) used by the app but I can't get the app to use this data without having to reload and document the whole app package again. With a regular shiny app (that isn't a package) simply replacing the csv's within the app directory works just fine. Is there no solution for updating data regularly besides querying the data within the app?

Answered by ilyaZar

Jun 13, 2023

I’ll do my best though I am really not an expert and used the things below to get me somehow through some projects…

Also it’s a bit difficult to give specific advice as the context of the project matters, but two things come to my mind.

The first is informal and adhoc; the second preferable if your project demands some sort of professionalism (you have to comply to data sharing agreements and/or it is long term and a professional setup is a wise choice as you can better implement new features from a solid ground in the future).

1. Scenario:

You use your own machine (literally your laptop) as “the server”. Query from the MySQL/Azure databases, run the cleaning/updates locally, store them i…

View full answer

ilyaZar · 2023-06-13T08:05:36Z

ilyaZar
Jun 13, 2023

I had once a similar problem... but I think you kind of answered your question already.

Simply do not make the cron job replace the csv inside your App, but make the shiny app pull from the results of the cron job which is stored somewhere (e.g. via SQL queries that may be then faster)?

So basically:

Could you make an separate/intermediate data storage place e.g. another SQL where the preprocessing/cleaning takes place or a small server/plumber thing doing the same?
If so, you could then do then cron-jobs against this instance (so you get your time series updates regularly there) and make your App simply pull updated data from there, instead of the original data base, or instead of the cron job forcing the updates inside your App/package.

(by the way I think it is good, that you cannot simply penetrate with a shell-job from the outside into your app for security purposes if I understand correctly what you are saying :) )

As you already rightly said:

The point is to avoid any unecessary processing at individual app instance level but carry that out at "the top level" instead or in any way before it is loaded into an App session. This could be beneficial whenever multiple users require the same updates/pre-processing and so this is carried out once (before) and not within each App (or R package for that matter) over and over again which would be just unnecessary work. This could also be beneficial for the overall user experience, as you suggested it avoids time laggs due to computations.

Also, when the App pulls (instead of the cron jobs forcing updates from outside into the App) you can have finer control over what type of data is pulled by which user/App instance in the futurej; which can matter depending on your project and where the direction of development goes in the future.

I think what you are writing is already kind of the solution, except for the intermediate place to store clean/updated data, but maybe I did not understand your question right

1 reply

22ermiller Jun 13, 2023
Author

Thanks for your response! I am fairly new to data backend side of app building so thanks for your patience. You're right that the root of the question really is, where is the best place to efficiently store clean/updated data that the app can easily access. To give more specifics on the problem, the data is supplied from an Azure SQL database and a MySQL database and the app is hosted on shiny server on an Azure VM. Could you break down what you suggested in steps 1 and 2 clearly the process of using the small server/plumber and then a cron-job against that instance? I've never used plumber and am not sure what the scope of its functionality is. (I've gone down many rabbit holes trying to find a solution to this and am trying to avoid going down another one)

ilyaZar · 2023-06-13T20:40:55Z

ilyaZar
Jun 13, 2023

I’ll do my best though I am really not an expert and used the things below to get me somehow through some projects…

Also it’s a bit difficult to give specific advice as the context of the project matters, but two things come to my mind.

The first is informal and adhoc; the second preferable if your project demands some sort of professionalism (you have to comply to data sharing agreements and/or it is long term and a professional setup is a wise choice as you can better implement new features from a solid ground in the future).

1. Scenario:

You use your own machine (literally your laptop) as “the server”. Query from the MySQL/Azure databases, run the cleaning/updates locally, store them in a new SQL database (or in a csv in Dropbox, which is really dodgy, but if you are allowed i.e. the data is not sensitive and is allowed to be stored locally why not). Finally make your App pull from the new SQL or from another place of your choice (Dropbox…) with light (computationally undemanding) SQL queries, as the computationally demanding tasks are already done on your laptop.

Benefits:

no extra knowledge, server setup or anything just bash scripts and SQL
fast, ideal for a prototype (e.g. to test how quickly your UI works i.e. test the user experience), or getting a minimum viable product (if you have a project deadline, presentation or that things)

Drawbacks:

doesnt work for sensitive data
your laptop must be available for the data cleaning when necessary e.g. doing updates and cleaning overnight or scheduled somehow

2. Scenario

You set up another “server” that does the data cleaning which is basically the same R code that you would run on your local machine, but now write a plumber API around it i.e. embed the R data cleaning code inside the plumber-framework https://www.rplumber.io/index.html`. You can trigger/call the computations/cleaning from outside, e.g. from your local machine or from your external shiny App via get-requests: the computations are done by the server, and the results/cleaned data are sent back from there into your App.

There is still the part of how to get the “raw” data into your R/plumber-construction on that server: if you can make a cron job (either being run locally on your machine or on the server automatically) to send the data updates from Azure+MySQL to your plumber/R-sessions on the server regularly/scheduled, then you are done and this is what I meant by “running the cron job against the server instance”.

You could have a cron job not running against the above setup, but alternatively making it move the data from Azure/MySQL to a new SQL database with some server side query/cleaning steps in between (if you are allowed to and the computations are not complicated). Your App then queries directly from that new SQL database (which has the cleaned data).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automating data updates without running queries within user session #1056

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Automating data updates without running queries within user session #1056

22ermiller Jun 13, 2023

1. Scenario:

Replies: 2 comments · 1 reply

ilyaZar Jun 13, 2023

22ermiller Jun 13, 2023 Author

ilyaZar Jun 13, 2023

1. Scenario:

2. Scenario

22ermiller
Jun 13, 2023

Replies: 2 comments 1 reply

ilyaZar
Jun 13, 2023

22ermiller Jun 13, 2023
Author

ilyaZar
Jun 13, 2023