The goal of {jecfa}
is to implement a pipeline to download and process
JECFA monographs, offering a Docker image to run the pipeline in
isolation and independently from OS or installed dependencies.
The project is orchestrated by the targets
package, which is a
pipeline toolkit for R.(Landau 2024) The pipeline is defined in
_targets.R
and the main functions are in R/
folder.
Main folders of the project are:
-
dev/
: contains the development scripts and files, i.e., the requirements, the script so execute the pipeline interactively and in background (on RStudio), and a checklist to run the pipeline. -
R/
: contains the main functions of the project (in RStudio you can access function definition directly on the correct file by pressingF2
on the function name wherever it is used in the project), and you can also access the documentation of the function by pressingF1
on the function name (or standard?<function_name>
in the console). -
data/
: contains the data saved/downloaded/used in the project, i.e.:FAS/
: contains the FAS PDFs (provided externally)TRS/
: contains the TRS PDFs (downloaded by the pipeline)TRS_unique/
: contains the TRS PDFs without the IDused/
: contains the data used in the tm phase
-
output/
: contains the output of the pipeline, i.e., the tables and the figures, and the shared object defined in the pipeline -
report/
: contains the reports and exploratory scripts of the pipeline, i.e., the Rmd files and the corresponding rendered HTMLs.
After the first run of the pipeline, you can access to every object
defined in the pipeline by running tar_read(<unquoted_target_name>)
in
the R console (even in new session within the project). E.g., you can
always retrieve the final jecfaDistiller
dataset by running
library(targets)
tar_read(jecfaDistiller)
Next, every time you run the pipeline, the pipeline will only re-run the
necessary steps to update the targets defined in the pipeline script
_targets.R
as tar_target(...)
.
If you like to see the state of the pipeline, you can run
tar_visnetwork()
to see the pipeline as a network graph. Note that
tar_visnetwork()
will show all the functions and dependencies of the
pipeline, so it could be a bit overwhelming, we suggest to call the
tar_visnetwork(targets_only = TRUE)
to see only the targets (i.e., the
objects) of the pipeline.
To run the pipeline, you would need an RStudio and open the project in it (this is not strongly required, but we assume it, and we do not discuss/document alternatives).
To do that we provide two main ways: use your local RStudio environment, or an RStudio server within a provided and configured Docker container.
NOTE: the system is not designed to run by the user directly. So, try to execute pieces of the pipeline interactively will lead to errors. To explore objects, and use the functions provided, you can use the
report/explore.R
script/template.
The easiest way to activate/use the project and run the pipeline is to use Docker. To do that, you would need to have Docker installed in your machine. If you do not have Docker installed, you can follow the instructions on the Docker website.
Next, download the file docker-compose.yaml
into any folder of your
host machine / computer, be sure Docker engine in running, go to a
terminal window (on Windows use CMD, we did not tested the execution on
PS), and run:
docker-compose up --build --detach
to start the container. Next you can visit in any browser of your host
machine the address localhost:18787
to access the RStudio Server with
all the dependencies installed and ready to work. The username is
rstudio
, and password is jecfa
. Once inside RStudio server, enter
the jecfa/
folder (the only one you see there) and click and activate
the jecfa.Rproj
project file.
You are now ready to run the pipeline by interactively by call .run()
on the console, or in the background by calling .background_run()
on
the console. Moreover, every time you will spin-up the container you
have access to all the targets’ objects created/updated by the last
pipeline execution by calling tar_read(<unquoted_target_name>)
in the
R console within the RStudio Server console.
Once you have finished your work, you can stop the container by running:
docker-compose down
If you cloned the project from GitHub, the docker image is defined in
the Dockerfile
. The docker-compose.yaml
file is used to setup the
environment, and the Makefile
is used to manage the container.
You can start the container by running make up
in the terminal, and
stop it by running make down
. You can also run the container directly
with custom options (bypassing compose) by running make run
in the
terminal (see the tag run
of the Makefile
).
To run the container effectively, we suggest to bind some internal
folder to the host machine, so you will not lose your work when the
container is stopped. By default, docker-compose.yaml
binds the
_targets/
, data/
, output/
, and report/
folders to the host
machine using docker-dedicated volumes. You can change this
configuration by editing the docker-compose.yaml
file (e.g., to bind
them on accessible folders on your host machine <note: we discourage
this>).
To run the pipeline in your local RStudio, you need to clone the project
from GitHub, and open the project in RStudio. To do that, you can open
RStudio, and click on File
-> New Project...
-> Version Control
-> Git
, and paste the URL of the project repository in the
Repository URL
field.
You will also need to install all the dependencies of the project.
Project’s dependencies are automatically managed by renv
, so you would
need to install renv
and restore the project dependencies. To do that,
you can open the project in RStudio, and run renv::restore()
, and
follow the instructions.(Ushey and Wickham 2024)
After that, you can open the project ad usual, and run the pipeline by
interactively by call .run()
on the console, or you can run the
pipeline in the background by calling .background_run()
on the
console.
Note: a start-up message will always guide you on how to run the pipeline every time you start the project.
REMINDER: you do need to run the pipeline only to execute/update it, i.e., to create/update the targets. This in general will be required only once (or at updates). Once you have execute the pipeline at least once, you can always have access the every defined targets by calling
tar_read(<unquoted_target_name>)
in the R console!
This project can be easily installed as an R package, giving you access to all the functions and dataset documentation. You have two main options for using the project: installing it as a package or loading it directly from the cloned repository.
- Install the Package from GitHub:
You can install the package directly from GitHub using the devtools
package. If you don’t have devtools
installed, you can install it
first:
install.packages("devtools")
Then, install the package using:
devtools::install_github("UBESP-DCTV/jecfa")
- Load the Package:
After installation, load the package with:
library(jecfa)
- Clone the Repository:
First, clone the repository to your local machine using Git:
git clone https://github.com/UBESP-DCTV/jecfa
cd jecfa
- (After opening R on the project) Load the Project without Installation:
If you prefer not to install the package, you can load it directly from
(an R session started/activated on the project) the cloned repository
using devtools
:
devtools::load_all()
Regardless of the method you choose, you can access the functions and
dataset documentation easily. For example, to view the documentation for
the jecfaDistiller
dataset, use:
?jecfaDistiller
Similarly, you can access documentation for other dataset and functions included in the package using their respective names:
?jecfa_tm_full
?jecfa_augmented
?jecfa
Please note that the jecfa project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Landau, William Michael. 2024. Targets: Dynamic Function-Oriented Make-Like Declarative Pipelines. https://docs.ropensci.org/targets/.
Ushey, Kevin, and Hadley Wickham. 2024. Renv: Project Environments. https://rstudio.github.io/renv/.