diff --git a/DESCRIPTION b/DESCRIPTION index a2b2434..5a0771b 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -13,7 +13,7 @@ BugReports: https://github.com/nhs-r-community/NHSRpopulation/issues Encoding: UTF-8 LazyData: true Roxygen: list(markdown = TRUE) -RoxygenNote: 7.2.3 +RoxygenNote: 7.3.0 Imports: cli, dplyr, @@ -27,7 +27,9 @@ Depends: Suggests: rmarkdown, knitr, - testthat (>= 3.0.0) + testthat (>= 3.0.0), + purrr, + tibble Remotes: nhs-r-community/NHSRpostcodetools VignetteBuilder: knitr Config/testthat/edition: 3 diff --git a/README.Rmd b/README.Rmd index c7b88ab..fc5566a 100644 --- a/README.Rmd +++ b/README.Rmd @@ -20,12 +20,7 @@ knitr::opts_chunk$set( The goal of `NHSRpopulation` is to make population estimates for **Lower layer Super Output Areas (LSOA)** and their **Indices of Multiple Deprivation (IMD)** easily available in R. -Population estimates are broken down by age (0 to 90+) and gender (female/male). -Information about the original sources of the data and a transparent description of all transformation of the data that is made available in this package can be found in this repository, see `"data-raw/imd.R` and `"data-raw/lsoa.R`. -Main changes to the original data structures include (1) the transformation from wide to long data, (2) the addition of further information that was only available in variable names, and (3) renaming variables in a consistent way. - -The current version of this package only includes LSOA population estimates and IMD scores for the year 2019 for England. -Because we store quite a lot in this package it currently relatively large (~9mb) compared to other packages. +In its first iteration this package was data saved from [https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019](https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019) and has subsequently been moved to the API [https://services1.arcgis.com/] to keep the data up to date (although it only updated every few years) and give access to all the nations across the UK including Wales, Scotland, Northern Ireland as well as England. ## Installation @@ -36,34 +31,6 @@ You can install the current version of `NHSRpopulation` from [GitHub](https://gi remotes::install_github("nhs-r-community/NHSRpopulation") ``` -## Example - -```{r} -# Load the package -library(NHSRpopulation) -``` - -### Lower layer Super Output Areas (LSOA) - -The LSOA population estimates are available in the dataset `lsoa`: - -```{r} -# Show the first 6 rows of the dataset -# For further information about this dataset see the help file: help(lsoa) -head(lsoa) -``` - -### Indices of Multiple Deprivation (IMD) - -The IMD scores (raw scores and ranked deciles) and available in the dataset `imd`: - -```{r} -# Show the first 6 rows of the dataset -# For further information about this dataset see the help file: help(imd) -head(imd) -``` - - ## Sources of Data The original source of the data provided in this R package is available [here](https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/lowersuperoutputareamidyearpopulationestimates) and licenced under the [Open Government Licence v3.0](http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/). diff --git a/README.md b/README.md index 9625a4d..43ff734 100644 --- a/README.md +++ b/README.md @@ -11,20 +11,14 @@ experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](h The goal of `NHSRpopulation` is to make population estimates for **Lower layer Super Output Areas (LSOA)** and their **Indices of Multiple -Deprivation (IMD)** easily available in R. Population estimates are -broken down by age (0 to 90+) and gender (female/male). Information -about the original sources of the data and a transparent description of -all transformation of the data that is made available in this package -can be found in this repository, see `"data-raw/imd.R` and -`"data-raw/lsoa.R`. Main changes to the original data structures include -(1) the transformation from wide to long data, (2) the addition of -further information that was only available in variable names, and (3) -renaming variables in a consistent way. - -The current version of this package only includes LSOA population -estimates and IMD scores for the year 2019 for England. Because we store -quite a lot in this package it currently relatively large (~9mb) -compared to other packages. +Deprivation (IMD)** easily available in R. In its first iteration this +package was data saved from + +and has subsequently been moved to the API +\[\] to keep the data up to date +(although it only updated every few years) and give access to all the +nations across the UK including Wales, Scotland, Northern Ireland as +well as England. ## Installation @@ -36,66 +30,6 @@ You can install the current version of `NHSRpopulation` from remotes::install_github("nhs-r-community/NHSRpopulation") ``` -## Example - -``` r -# Load the package -library(NHSRpopulation) -#> -#> ── This is NHSRpopulation 0.0.2 ──────────────────────────────────────────────── -#> ℹ Please report any issues or ideas at: -#> ℹ https://github.com/nhs-r-community/NHSRpopulation/issues -``` - -### Lower layer Super Output Areas (LSOA) - -The LSOA population estimates are available in the dataset `lsoa`: - -``` r -# Show the first 6 rows of the dataset -# For further information about this dataset see the help file: help(lsoa) -head(lsoa) -#> lsoa_year lsoa_code lsoa_name la_year la_code la_name age -#> 1 2019 E01000001 City of London 001A 2019 E09000001 City of London 0 -#> 2 2019 E01000001 City of London 001A 2019 E09000001 City of London 1 -#> 3 2019 E01000001 City of London 001A 2019 E09000001 City of London 2 -#> 4 2019 E01000001 City of London 001A 2019 E09000001 City of London 3 -#> 5 2019 E01000001 City of London 001A 2019 E09000001 City of London 4 -#> 6 2019 E01000001 City of London 001A 2019 E09000001 City of London 5 -#> gender est_year n -#> 1 f 2019 2 -#> 2 f 2019 9 -#> 3 f 2019 4 -#> 4 f 2019 12 -#> 5 f 2019 11 -#> 6 f 2019 5 -``` - -### Indices of Multiple Deprivation (IMD) - -The IMD scores (raw scores and ranked deciles) and available in the -dataset `imd`: - -``` r -# Show the first 6 rows of the dataset -# For further information about this dataset see the help file: help(imd) -head(imd) -#> lsoa_year lsoa_code lsoa_name la_year la_code -#> 1 2011 E01000001 City of London 001A 2019 E09000001 -#> 2 2011 E01000002 City of London 001B 2019 E09000001 -#> 3 2011 E01000003 City of London 001C 2019 E09000001 -#> 4 2011 E01000005 City of London 001E 2019 E09000001 -#> 5 2011 E01000006 Barking and Dagenham 016A 2019 E09000002 -#> 6 2011 E01000007 Barking and Dagenham 015A 2019 E09000002 -#> la_name imd_year imd_score imd_decile -#> 1 City of London 2019 6.208 9 -#> 2 City of London 2019 5.143 10 -#> 3 City of London 2019 19.402 5 -#> 4 City of London 2019 28.652 3 -#> 5 Barking and Dagenham 2019 19.837 5 -#> 6 Barking and Dagenham 2019 31.576 3 -``` - ## Sources of Data The original source of the data provided in this R package is available diff --git a/vignettes/get-started.Rmd b/vignettes/get-started.Rmd new file mode 100644 index 0000000..cde5861 --- /dev/null +++ b/vignettes/get-started.Rmd @@ -0,0 +1,112 @@ +--- +title: "Getting started using the package" +output: rmarkdown::html_vignette +bibliography: "references.bib" +link-citations: TRUE +vignette: > + %\VignetteIndexEntry{get-started} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + echo = TRUE, + eval = FALSE, + comment = "#>", + fig.path = "man/figures/README-", + out.width = "100%" +) +``` + +### Indices of Multiple Deprivation (IMD) + +To get the IMD scores (raw scores and ranked deciles) for a dataset run the following code to generate some random example postcodes: + +```{r} +library(purrr) +library(tibble) +library(PostcodesioR) +library(NHSRpopulation) + +postcodes <- purrr::map_chr( + 1:10, + .f = ~PostcodesioR::random_postcode() |> + purrr::pluck(1) +) + +tibble_postcodes <- postcodes |> + tibble::as_tibble() +``` + +Then, using the `get_imd()` function for a vector (returning just the first five columns): + +```{r} +NHSRpopulation::get_imd(postcodes) |> + dplyr::select(1:5) +``` + +Or with a data frame (returning just the first five columns): + +```{r} +NHSRpopulation::get_imd(tibble_postcodes$value) |> + dplyr::select(1:5) +``` + +This function can be used to fix missing postcodes as some are terminated or are invalid: + +```{r} +postcodes <- c("HD1 2UT", "HD1 2UU", "HD1 2UV") + +NHSRpopulation::get_imd(postcodes) |> + dplyr::select(1:5) + +``` + +Currently, although the postcode is fixed with the column `new_postcode` the IMD is not overwritten. + +## Lower Super Output area (LSOA) + +To return the `IMD`, `imd_decile` and `imd_quintile` for LSOAs this can be as a vector: + +```{r} +# Example LSOAs from each England Decile group +lsoa_imd <- c("E01000002", + "E01000001", + "E01000117", + "E01000119", + "E01000069", + "E01000070", + "E01000066", + "E01000005", + "E01000008", + "E01000048") + +NHSRpopulation::get_lsoa(lsoa_imd) |> + head(10) # first 10 rows +``` + +Or from a data frame: + +```{r} + +tibble_lsoa_imd <- lsoa_imd |> + tibble::as_tibble() + +NHSRpopulation::get_lsoa(tibble_lsoa_imd$value) |> + head(10) + +``` + +The functions return everything in those LSOAs and if you would like to return some random postcodes from each decile: + +```{r} +NHSRpopulation::get_lsoa(lsoa_imd, return = "random") +``` + +Or just the first postcode that appears in each decile: + +```{r} +NHSRpopulation::get_lsoa(lsoa_imd, return = "first") +```