-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
114 lines (80 loc) · 6.85 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
error = FALSE,
warning = FALSE,
message = FALSE,
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# covidphtext: Utilities to Extract Text Data From COVID-19-Related Resolutions and Policies From the Philippines <img src="man/figures/covidphtext.png" width="200px" align="right" />
<!-- badges: start -->
[![Lifecycle: maturing](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://www.tidyverse.org/lifecycle/#maturing)
[![CRAN status](https://www.r-pkg.org/badges/version/covidphtext)](https://CRAN.R-project.org/package=covidphtext)
[![CRAN](https://img.shields.io/cran/l/covidphtext.svg)](https://CRAN.R-project.org/package=covidphtext)
[![CRAN](http://cranlogs.r-pkg.org/badges/covidphtext)](https://CRAN.R-project.org/package=covidphtext)
[![CRAN](http://cranlogs.r-pkg.org/badges/grand-total/covidphtext)](https://CRAN.R-project.org/package=covidphtext)
[![R build status](https://github.com/como-ph/covidphtext/workflows/R-CMD-check/badge.svg)](https://github.com/como-ph/covidphtext/actions)
[![test-coverage](https://github.com/como-ph/covidphtext/workflows/test-coverage/badge.svg)](https://github.com/como-ph/covidphtext/actions?query=workflow%3Atest-coverage)
[![Codecov test coverage](https://codecov.io/gh/como-ph/covidphtext/branch/master/graph/badge.svg)](https://codecov.io/gh/como-ph/covidphtext?branch=master)
[![CodeFactor](https://www.codefactor.io/repository/github/como-ph/covidphtext/badge)](https://www.codefactor.io/repository/github/como-ph/covidphtext)
[![DOI](https://zenodo.org/badge/265376181.svg)](https://zenodo.org/badge/latestdoi/265376181)
<!-- badges: end -->
To assess possible impact of various COVID-19 prediction models on Philippine government response, text from various resolutions issued by the Inter-agency Task Force for the Management of Emerging Infectious Diseases (IATF) has been collected using data mining approaches implemented in R. This package includes functions used for this data mining process and datasets of text that have been collected and processed for use in text analysis.
## Installation
`covidphtext` is not yet available on CRAN. It is currently in active development stage. Installation of `covidphtext` at this point is only possible through its development version via [GitHub](https://github.com/como-ph/covidphtext):
```{r install, echo = TRUE, eval = FALSE}
if(!require(remotes)) install.packages("remotes")
remotes::install_github("como-ph/covidphtext")
```
then load the package:
```{r load, echo = TRUE, eval = TRUE}
library(covidphtext)
```
## Usage
### Datasets
`covidphtext` currently has ```r nrow(utils::data(package = "covidphtext")$results)``` datasets of which ```r nrow(iatfLinksGazette[!is.na(iatfLinksGazette$id), ]) - 1``` are COVID-19-related resolutions and policies in the Philippines made by the Inter-Agency Task Force for the Management of Emerging Infectious Diseases (IATF), 1 is the Omnibus Guidelines on the Implementation of Community Quarantine in the Philippines released by the IATF and 2 are reference lists of links to these resolutions and guidelines.
A description of the available datasets can be found [here](https://como-ph.github.io/covidphtext/reference/index.html#section-datasets).
The IATF resolutions are officially available from two online sources: 1) The Department of Health (DoH) [website](http://www.doh.gov.ph/COVID-19/IATF-Resolutions/); and, 2) The Philippines Official Gazette [website](https://www.officialgazette.gov.ph/section/laws/other-issuances/inter-agency-task-force-for-the-management-of-emerging-infectious-diseases-resolutions/). The DOH website currently only holds IATF resolutions starting from resolution number 9 and later. The Official Gazette on the other hand contains resolutions 1 to the most current.
To get a list of the IATF resolutions that are available from the DOH website, the function `get_iatf_links()` can be used as follows:
```{r usage1, echo = TRUE, eval = TRUE}
get_iatf_links()
```
Given that the DOH website doesn't have the first 8 resolutions, this function will soon be deprecated in favour of the newer function below that interfaces with the Official Gazette.
A table of all the IATF resolutions and the URLs to download them can be generated using the newer function `get_iatf_gazette()` as follows:
```{r usage2, echo = TRUE, eval = FALSE}
list_iatf_pages(base = "https://www.officialgazette.gov.ph/section/laws/other-issuances/inter-agency-task-force-for-the-management-of-emerging-infectious-diseases-resolutions/",
pages = 1:6) %>%
get_iatf_pages() %>%
get_iatf_gazette()
```
```{r usage2a, echo = FALSE, eval = TRUE}
iatfLinksGazette
```
The actual PDF of the IATF resolutions/s can be downloaded using the `get_iatf_pdfs()` function. For example, to download IATF Resolution No. 29, the following command is issued:
```{r usage3, echo = TRUE, eval = TRUE}
get_iatf_pdfs(links = iatfLinks, id = 29)
```
The command downloads the PDF of the specified IATF Resolution into a temporary directory (using `tempdir()` function). The output of the `get_iatf_pdfs()` function is a named character vector of directory path/s to downloaded PDFs as shown above. The names of the character vector correspond to the resolution number. These paths can then be used when working with these files.
The `get_iatf_pdfs()` function interfaces with both the DOH and The Official Gazette website.
### Concatenating text datasets
The datasets described above can be processed and analysed on their own or as a combined corpus of text data. `covidphtext` provides convenience functions that concatenates all or specific text datasets available from the `covidphtext` package.
#### Concatenating datasets based on a specific search term
The `combine_docs` function allows the user to specify search terms to use in identifying datasets provided by the `covidphtext` package. The `docs` argument allows the specification of a vector of search terms to use to identify the names of datasets to concatenate. If the name/s of the datasets contain these search terms, the datasets with these name/s will be returned.
```{r usage5, echo = TRUE, eval = TRUE}
combine_docs(docs = "resolution")
```
The `combine_iatf` function is a specialised wrapper of the `combine_docs` function that specifically returns datasets containing IATF resolutions. An additional argument `res` allows users to specify which IATF resolutions to return. To get IATF resolution 10, 11, and 12, the following call to `combine_iatf` is made as follows:
```{r usage6, echo = TRUE, eval = TRUE}
combine_iatf(docs = "resolution", res = 10:12)
```
To check if only resolutions 10 to 12 have been returned:
```{r usage7, echo = TRUE, eval = TRUE}
combine_iatf(docs = "resolution", res = 10:12)[ , c("type", "id")]
```