Merge pull request #28 from nhs-r-community/update-readme

Update readme
nhs-r-community · Jan 22, 2024 · 9a8d9b8 · 9a8d9b8
2 parents cbbe8e3 + a8e8632
commit 9a8d9b8
Show file tree

Hide file tree

Showing 4 changed files with 125 additions and 110 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -13,7 +13,7 @@ BugReports: https://github.com/nhs-r-community/NHSRpopulation/issues
 Encoding: UTF-8
 LazyData: true
 Roxygen: list(markdown = TRUE)
-RoxygenNote: 7.2.3
+RoxygenNote: 7.3.0
 Imports: 
     cli,
     dplyr,
@@ -27,7 +27,9 @@ Depends:
 Suggests: 
     rmarkdown,
     knitr,
-    testthat (>= 3.0.0)
+    testthat (>= 3.0.0),
+    purrr,
+    tibble
 Remotes: nhs-r-community/NHSRpostcodetools
 VignetteBuilder: knitr
 Config/testthat/edition: 3
diff --git a/README.Rmd b/README.Rmd
@@ -20,12 +20,7 @@ knitr::opts_chunk$set(
 <!-- badges: end -->
 
 The goal of `NHSRpopulation` is to make population estimates for **Lower layer Super Output Areas (LSOA)** and their **Indices of Multiple Deprivation (IMD)** easily available in R.
-Population estimates are broken down by age (0 to 90+) and gender (female/male).
-Information about the original sources of the data and a transparent description of all transformation of the data that is made available in this package can be found in this repository, see `"data-raw/imd.R` and `"data-raw/lsoa.R`.
-Main changes to the original data structures include (1) the transformation from wide to long data, (2) the addition of further information that was only available in variable names, and (3) renaming variables in a consistent way.
-
-The current version of this package only includes LSOA population estimates and IMD scores for the year 2019 for England.
-Because we store quite a lot in this package it currently relatively large (~9mb) compared to other packages.
+In its first iteration this package was data saved from [https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019](https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019) and has subsequently been moved to the API [https://services1.arcgis.com/] to keep the data up to date (although it only updated every few years) and give access to all the nations across the UK including Wales, Scotland, Northern Ireland as well as England.
 
 ## Installation
 
@@ -36,34 +31,6 @@ You can install the current version of `NHSRpopulation` from [GitHub](https://gi
 remotes::install_github("nhs-r-community/NHSRpopulation")
 ```
 
-## Example
-
-```{r}
-# Load the package
-library(NHSRpopulation)
-```
-
-### Lower layer Super Output Areas (LSOA)
-
-The LSOA population estimates are available in the dataset `lsoa`:
-
-```{r}
-# Show the first 6 rows of the dataset
-# For further information about this dataset see the help file: help(lsoa)
-head(lsoa)
-```
-
-### Indices of Multiple Deprivation (IMD)
-
-The IMD scores (raw scores and ranked deciles) and available in the dataset `imd`:
-
-```{r}
-# Show the first 6 rows of the dataset
-# For further information about this dataset see the help file: help(imd)
-head(imd)
-```
-
-
 ## Sources of Data
 
 The original source of the data provided in this R package is available [here](https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/lowersuperoutputareamidyearpopulationestimates) and licenced under the [Open Government Licence v3.0](http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/).

diff --git a/README.md b/README.md
@@ -11,20 +11,14 @@ experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](h
 
 The goal of `NHSRpopulation` is to make population estimates for **Lower
 layer Super Output Areas (LSOA)** and their **Indices of Multiple
-Deprivation (IMD)** easily available in R. Population estimates are
-broken down by age (0 to 90+) and gender (female/male). Information
-about the original sources of the data and a transparent description of
-all transformation of the data that is made available in this package
-can be found in this repository, see `"data-raw/imd.R` and
-`"data-raw/lsoa.R`. Main changes to the original data structures include
-(1) the transformation from wide to long data, (2) the addition of
-further information that was only available in variable names, and (3)
-renaming variables in a consistent way.
-
-The current version of this package only includes LSOA population
-estimates and IMD scores for the year 2019 for England. Because we store
-quite a lot in this package it currently relatively large (~9mb)
-compared to other packages.
+Deprivation (IMD)** easily available in R. In its first iteration this
+package was data saved from
+<https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019>
+and has subsequently been moved to the API
+\[<https://services1.arcgis.com/>\] to keep the data up to date
+(although it only updated every few years) and give access to all the
+nations across the UK including Wales, Scotland, Northern Ireland as
+well as England.
 
 ## Installation
 
@@ -36,66 +30,6 @@ You can install the current version of `NHSRpopulation` from
 remotes::install_github("nhs-r-community/NHSRpopulation")
 ```
 
-## Example
-
-``` r
-# Load the package
-library(NHSRpopulation)
-#> 
-#> ── This is NHSRpopulation 0.0.2 ────────────────────────────────────────────────
-#> ℹ Please report any issues or ideas at:
-#> ℹ https://github.com/nhs-r-community/NHSRpopulation/issues
-```
-
-### Lower layer Super Output Areas (LSOA)
-
-The LSOA population estimates are available in the dataset `lsoa`:
-
-``` r
-# Show the first 6 rows of the dataset
-# For further information about this dataset see the help file: help(lsoa)
-head(lsoa)
-#>   lsoa_year lsoa_code           lsoa_name la_year   la_code        la_name age
-#> 1      2019 E01000001 City of London 001A    2019 E09000001 City of London   0
-#> 2      2019 E01000001 City of London 001A    2019 E09000001 City of London   1
-#> 3      2019 E01000001 City of London 001A    2019 E09000001 City of London   2
-#> 4      2019 E01000001 City of London 001A    2019 E09000001 City of London   3
-#> 5      2019 E01000001 City of London 001A    2019 E09000001 City of London   4
-#> 6      2019 E01000001 City of London 001A    2019 E09000001 City of London   5
-#>   gender est_year  n
-#> 1      f     2019  2
-#> 2      f     2019  9
-#> 3      f     2019  4
-#> 4      f     2019 12
-#> 5      f     2019 11
-#> 6      f     2019  5
-```
-
-### Indices of Multiple Deprivation (IMD)
-
-The IMD scores (raw scores and ranked deciles) and available in the
-dataset `imd`:
-
-``` r
-# Show the first 6 rows of the dataset
-# For further information about this dataset see the help file: help(imd)
-head(imd)
-#>   lsoa_year lsoa_code                 lsoa_name la_year   la_code
-#> 1      2011 E01000001       City of London 001A    2019 E09000001
-#> 2      2011 E01000002       City of London 001B    2019 E09000001
-#> 3      2011 E01000003       City of London 001C    2019 E09000001
-#> 4      2011 E01000005       City of London 001E    2019 E09000001
-#> 5      2011 E01000006 Barking and Dagenham 016A    2019 E09000002
-#> 6      2011 E01000007 Barking and Dagenham 015A    2019 E09000002
-#>                la_name imd_year imd_score imd_decile
-#> 1       City of London     2019     6.208          9
-#> 2       City of London     2019     5.143         10
-#> 3       City of London     2019    19.402          5
-#> 4       City of London     2019    28.652          3
-#> 5 Barking and Dagenham     2019    19.837          5
-#> 6 Barking and Dagenham     2019    31.576          3
-```
-
 ## Sources of Data
 
 The original source of the data provided in this R package is available

diff --git a/vignettes/get-started.Rmd b/vignettes/get-started.Rmd
@@ -0,0 +1,112 @@
+---
+title: "Getting started using the package"
+output: rmarkdown::html_vignette
+bibliography: "references.bib"
+link-citations: TRUE
+vignette: >
+  %\VignetteIndexEntry{get-started}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r, include = FALSE}
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  echo = TRUE,
+  eval = FALSE,
+  comment = "#>",
+  fig.path = "man/figures/README-",
+  out.width = "100%"
+)
+```
+
+### Indices of Multiple Deprivation (IMD)
+
+To get the IMD scores (raw scores and ranked deciles) for a dataset run the following code to generate some random example postcodes:
+
+```{r}
+library(purrr)
+library(tibble)
+library(PostcodesioR)
+library(NHSRpopulation)
+
+postcodes <- purrr::map_chr(
+  1:10,
+  .f = ~PostcodesioR::random_postcode() |> 
+    purrr::pluck(1) 
+) 
+
+tibble_postcodes <- postcodes |> 
+  tibble::as_tibble()
+```
+
+Then, using the `get_imd()` function for a vector (returning just the first five columns):
+
+```{r}
+NHSRpopulation::get_imd(postcodes) |> 
+  dplyr::select(1:5)
+```
+
+Or with a data frame (returning just the first five columns):
+
+```{r}
+NHSRpopulation::get_imd(tibble_postcodes$value) |> 
+  dplyr::select(1:5)
+```
+
+This function can be used to fix missing postcodes as some are terminated or are invalid:
+
+```{r}
+postcodes <- c("HD1 2UT", "HD1 2UU", "HD1 2UV")
+
+NHSRpopulation::get_imd(postcodes) |> 
+  dplyr::select(1:5)
+
+```
+
+Currently, although the postcode is fixed with the column `new_postcode` the IMD is not overwritten.
+
+## Lower Super Output area (LSOA)
+
+To return the `IMD`, `imd_decile` and `imd_quintile` for LSOAs this can be as a vector:
+
+```{r}
+# Example LSOAs from each England Decile group
+lsoa_imd <- c("E01000002",
+              "E01000001",
+              "E01000117",
+              "E01000119",
+              "E01000069",
+              "E01000070",
+              "E01000066",
+              "E01000005",
+              "E01000008",
+              "E01000048")
+
+NHSRpopulation::get_lsoa(lsoa_imd) |> 
+  head(10) # first 10 rows
+```
+
+Or from a data frame:
+
+```{r}
+
+tibble_lsoa_imd <- lsoa_imd |> 
+  tibble::as_tibble()
+
+NHSRpopulation::get_lsoa(tibble_lsoa_imd$value) |> 
+  head(10)
+
+```
+
+The functions return everything in those LSOAs and if you would like to return some random postcodes from each decile:
+
+```{r}
+NHSRpopulation::get_lsoa(lsoa_imd, return = "random")
+```
+
+Or just the first postcode that appears in each decile:
+
+```{r}
+NHSRpopulation::get_lsoa(lsoa_imd, return = "first")
+```