Skip to content

Latest commit

 

History

History
219 lines (185 loc) · 11 KB

README.md

File metadata and controls

219 lines (185 loc) · 11 KB

covidphtext: Utilities to Extract Text Data From COVID-19-Related Resolutions and Policies From the Philippines

Lifecycle: maturing CRAN status CRAN CRAN CRAN R build status test-coverage Codecov test coverage CodeFactor DOI

To assess possible impact of various COVID-19 prediction models on Philippine government response, text from various resolutions issued by the Inter-agency Task Force for the Management of Emerging Infectious Diseases (IATF) has been collected using data mining approaches implemented in R. This package includes functions used for this data mining process and datasets of text that have been collected and processed for use in text analysis.

Installation

covidphtext is not yet available on CRAN. It is currently in active development stage. Installation of covidphtext at this point is only possible through its development version via GitHub:

if(!require(remotes)) install.packages("remotes")
remotes::install_github("como-ph/covidphtext")

then load the package:

library(covidphtext)

Usage

Datasets

covidphtext currently has 69 datasets of which 65 are COVID-19-related resolutions and policies in the Philippines made by the Inter-Agency Task Force for the Management of Emerging Infectious Diseases (IATF), 1 is the Omnibus Guidelines on the Implementation of Community Quarantine in the Philippines released by the IATF and 2 are reference lists of links to these resolutions and guidelines.

A description of the available datasets can be found here.

The IATF resolutions are officially available from two online sources:

  1. The Department of Health (DoH) website; and, 2) The Philippines Official Gazette website. The DOH website currently only holds IATF resolutions starting from resolution number 9 and later. The Official Gazette on the other hand contains resolutions 1 to the most current.

To get a list of the IATF resolutions that are available from the DOH website, the function get_iatf_links() can be used as follows:

get_iatf_links()
#> # A tibble: 74 x 7
#>       id title              date       source type  url               checked   
#>    <dbl> <chr>              <date>     <chr>  <chr> <chr>             <date>    
#>  1     9 Recommendations f… 2020-03-03 IATF   reso… https://doh.gov.… 2020-08-04
#>  2    10 Recommendations f… 2020-03-09 IATF   reso… https://doh.gov.… 2020-08-04
#>  3    11 Recommendations f… 2020-03-12 IATF   reso… https://doh.gov.… 2020-08-04
#>  4    12 Recommendations f… 2020-03-13 IATF   reso… https://doh.gov.… 2020-08-04
#>  5    13 Recommendations f… 2020-03-17 IATF   reso… https://doh.gov.… 2020-08-04
#>  6    14 Resolutions Relat… 2020-03-20 IATF   reso… https://doh.gov.… 2020-08-04
#>  7    15 Resolutions Relat… 2020-03-25 IATF   reso… https://doh.gov.… 2020-08-04
#>  8    16 Additional Guidel… 2020-03-30 IATF   reso… https://doh.gov.… 2020-08-04
#>  9    17 Recommendations R… 2020-03-30 IATF   reso… https://doh.gov.… 2020-08-04
#> 10    18 Recommendations R… 2020-04-01 IATF   reso… https://doh.gov.… 2020-08-04
#> # … with 64 more rows

Given that the DOH website doesn’t have the first 8 resolutions, this function will soon be deprecated in favour of the newer function below that interfaces with the Official Gazette.

A table of all the IATF resolutions and the URLs to download them can be generated using the newer function get_iatf_gazette() as follows:

list_iatf_pages(base = "https://www.officialgazette.gov.ph/section/laws/other-issuances/inter-agency-task-force-for-the-management-of-emerging-infectious-diseases-resolutions/", 
                pages = 1:6) %>%
  get_iatf_pages() %>%
  get_iatf_gazette()
#> # A tibble: 73 x 7
#>       id title              date       source type  url               checked   
#>    <dbl> <chr>              <date>     <chr>  <chr> <chr>             <date>    
#>  1    60 RESOLUTION NO. 60… 2020-07-30 IATF   reso… https://www.offi… 2020-08-02
#>  2    60 RESOLUTION NO. 60… 2020-07-30 IATF   reso… https://www.offi… 2020-08-02
#>  3    59 RESOLUTION NO. 59… 2020-07-28 IATF   reso… https://www.offi… 2020-08-02
#>  4    58 RESOLUTION NO. 58… 2020-07-23 IATF   reso… https://www.offi… 2020-08-02
#>  5    57 RESOLUTION NO. 57… 2020-07-21 IATF   reso… https://www.offi… 2020-08-02
#>  6    NA OMNIBUS GUIDELINE… 2020-07-16 IATF   reso… https://www.offi… 2020-08-02
#>  7    56 RESOLUTION NO. 56… 2020-07-16 IATF   reso… https://www.offi… 2020-08-02
#>  8    55 RESOLUTION NO. 55… 2020-07-14 IATF   reso… https://www.offi… 2020-08-02
#>  9    55 RESOLUTION NO. 55… 2020-07-14 IATF   reso… https://www.offi… 2020-08-02
#> 10    54 RESOLUTION NO. 54… 2020-07-11 IATF   reso… https://www.offi… 2020-08-02
#> # … with 63 more rows

The actual PDF of the IATF resolutions/s can be downloaded using the get_iatf_pdfs() function. For example, to download IATF Resolution No. 29, the following command is issued:

get_iatf_pdfs(links = iatfLinks, id = 29)
#>                                                                iatfResolution29 
#> "/var/folders/fk/s0yv8hhn2cs_nfsmzhm4dmhc0000gn/T//RtmpOtmTrk/file263d25fd8821"

The command downloads the PDF of the specified IATF Resolution into a temporary directory (using tempdir() function). The output of the get_iatf_pdfs() function is a named character vector of directory path/s to downloaded PDFs as shown above. The names of the character vector correspond to the resolution number. These paths can then be used when working with these files.

The get_iatf_pdfs() function interfaces with both the DOH and The Official Gazette website.

Concatenating text datasets

The datasets described above can be processed and analysed on their own or as a combined corpus of text data. covidphtext provides convenience functions that concatenates all or specific text datasets available from the covidphtext package.

Concatenating datasets based on a specific search term

The combine_docs function allows the user to specify search terms to use in identifying datasets provided by the covidphtext package. The docs argument allows the specification of a vector of search terms to use to identify the names of datasets to concatenate. If the name/s of the datasets contain these search terms, the datasets with these name/s will be returned.

combine_docs(docs = "resolution")
#> # A tibble: 7,728 x 7
#>    linenumber text                       source type       id section date      
#>         <int> <chr>                      <chr>  <chr>   <dbl> <chr>   <date>    
#>  1          1 Republic of the Philippin… IATF   resolu…     1 heading 2020-01-28
#>  2          2 Department of Health       IATF   resolu…     1 heading 2020-01-28
#>  3          3 Office of the Secretary    IATF   resolu…     1 heading 2020-01-28
#>  4          4 Inter-Agency Task Force f… IATF   resolu…     1 heading 2020-01-28
#>  5          5 Emerging Infectious Disea… IATF   resolu…     1 heading 2020-01-28
#>  6          6 28 January 2020            IATF   resolu…     1 heading 2020-01-28
#>  7          7 Resolution No. 01          IATF   resolu…     1 heading 2020-01-28
#>  8          8 Series of 2020             IATF   resolu…     1 heading 2020-01-28
#>  9          9 Recommendations for the M… IATF   resolu…     1 heading 2020-01-28
#> 10         10 Novel Coronavirus Situati… IATF   resolu…     1 heading 2020-01-28
#> # … with 7,718 more rows

The combine_iatf function is a specialised wrapper of the combine_docs function that specifically returns datasets containing IATF resolutions. An additional argument res allows users to specify which IATF resolutions to return. To get IATF resolution 10, 11, and 12, the following call to combine_iatf is made as follows:

combine_iatf(docs = "resolution", res = 10:12)
#> # A tibble: 351 x 7
#>    linenumber text                        source type      id section date      
#>         <int> <chr>                       <chr>  <chr>  <dbl> <chr>   <date>    
#>  1          1 Republic of the Philippines IATF   resol…    10 heading 2020-03-09
#>  2          2 DOH DFA DILG DOJ DOLE DOT … IATF   resol…    10 heading 2020-03-09
#>  3          3 Inter-Agency Task Force fo… IATF   resol…    10 heading 2020-03-09
#>  4          4 Emerging Infectious Diseas… IATF   resol…    10 heading 2020-03-09
#>  5          5 Resolution No. 10           IATF   resol…    10 heading 2020-03-09
#>  6          6 Series of 2020              IATF   resol…    10 heading 2020-03-09
#>  7          7 9 March 2020                IATF   resol…    10 heading 2020-03-09
#>  8          8 Recommendations for the ma… IATF   resol…    10 heading 2020-03-09
#>  9          9 disease 2019 (COVID-19) si… IATF   resol…    10 heading 2020-03-09
#> 10         10 WHEREAS, Section 15 of Art… IATF   resol…    10 preamb… 2020-03-09
#> # … with 341 more rows

To check if only resolutions 10 to 12 have been returned:

combine_iatf(docs = "resolution", res = 10:12)[ , c("type", "id")]
#> # A tibble: 351 x 2
#>    type          id
#>    <chr>      <dbl>
#>  1 resolution    10
#>  2 resolution    10
#>  3 resolution    10
#>  4 resolution    10
#>  5 resolution    10
#>  6 resolution    10
#>  7 resolution    10
#>  8 resolution    10
#>  9 resolution    10
#> 10 resolution    10
#> # … with 341 more rows