Skip to content

Commit

Permalink
use dummy address components for better addr parsing of tiger street …
Browse files Browse the repository at this point in the history
…range names (#35)

* structure tiger street names for addr parsing

* print addr objects in color and with style to visually represent tags (#34)

* example code for working with s2 parent cells and mapping with  rdeck (#33)

* parse tiger ranges into addr with dummy addr tags

* cleanup, update authors

---------

Co-authored-by: Cole Brokamp <[email protected]>
  • Loading branch information
erikarasnick and cole-brokamp authored Oct 22, 2024
1 parent 0445d08 commit e337692
Show file tree
Hide file tree
Showing 5 changed files with 16 additions and 8 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ Package: addr
Title: Clean, Parse, Harmonize, Match, and Geocode Messy Real-World Addresses
Version: 0.4.0.9020
Authors@R:
person("Cole", "Brokamp", , "[email protected]", role = c("aut", "cre"),
comment = c(ORCID = "0000-0002-0289-3151"))
c(person("Cole", "Brokamp", email = "[email protected]", role = c("aut", "cre")),
person("Erika", "Manning", role = c("aut")))
Description: Addresses that were not validated at the time of collection are often heterogenously formatted, making them difficult to compare or link to other sets of addresses. The addr package is designed to clean character strings of addresses, use the `usaddress` library to tag address components, and paste together select components to create a normalized address. Normalized addresses can be hashed to create hashdresses that can be used to merge with other sets of addresses.
URL: https://github.com/cole-brokamp/addr, https://cole-brokamp.github.io/addr/
BugReports: https://github.com/cole-brokamp/addr/issues
Expand Down
5 changes: 4 additions & 1 deletion R/addr_tiger_match.R
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,9 @@ get_tiger_street_ranges <- function(county, year = "2022") {
#' a street range tibble with zero rows indicates that although a street was matched,
#' there was no range containing the street number
#' @export
#' @details
#' To best parse street names and types, this function appends dummy address components just
#' for the purposes of matching tiger street range names (e.g., `1234 {tiger_street_name} Anytown AB 00000`)
#' @examples
#' my_addr <- as_addr(c("224 Woolper Ave", "3333 Burnet Ave", "33333 Burnet Ave", "609 Walnut St"))
#'
Expand All @@ -78,7 +81,7 @@ addr_match_tiger_street_ranges <- function(x,

street_matches <-
addr_match_street(ia,
suppressWarnings(as_addr(names(d_tiger))),
suppressWarnings(as_addr(glue::glue("1234 {names(d_tiger)} Anytown AB 00000"))),
stringdist_match = "osa_lt_1",
match_street_type = TRUE
) |>
Expand Down
4 changes: 4 additions & 0 deletions man/addr_match_tiger_street_ranges.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion tests/testthat/test-addr_match_geocode.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ test_that("addr_match_geocode() works", {
table(my_geocodes$match_method) |>
expect_identical(
structure(
c(ref_addr = 216L, tiger_range = 11L, tiger_street = 2L, none = 22L),
c(ref_addr = 216L, tiger_range = 16L, tiger_street = 2L, none = 17L),
dim = 4L,
dimnames = structure(list(c("ref_addr", "tiger_range", "tiger_street", "none")), names = ""),
class = "table"
Expand Down
9 changes: 5 additions & 4 deletions tests/testthat/test-addr_tiger_match.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,16 @@ test_that("get_tiger_street_ranges() works", {
})

test_that("addr_match_tiger_street_ranges() works", {
addr_match_tiger_street_ranges(as_addr(c("224 Woolper Ave", "3333 Burnet Ave", "33333 Burnet Ave", "609 Walnut St")),
addr_match_tiger_street_ranges(as_addr(c("224 Woolper Ave", "3333 Burnet Ave", "33333 Burnet Ave", "609 Walnut St", "609 Weknut Street")),
street_only_match = "none"
) |>
purrr::map(nrow) |>
expect_identical(list(
`224 Woolper Avenue` = 1L,
`3333 Burnet Avenue` = 2L,
`33333 Burnet Avenue` = 0L,
`609 Walnut Street` = NULL
`609 Walnut Street` = 2L,
`609 Weknut Street` = NULL
))

addr_match_tiger_street_ranges(as_addr(c("224 Woolper Ave", "3333 Burnet Ave", "33333 Burnet Ave", "609 Walnut St")),
Expand All @@ -30,7 +31,7 @@ test_that("addr_match_tiger_street_ranges() works", {
`224 Woolper Avenue` = 1L,
`3333 Burnet Avenue` = 2L,
`33333 Burnet Avenue` = 20L,
`609 Walnut Street` = NULL
`609 Walnut Street` = 2L
))

addr_match_tiger_street_ranges(as_addr(c("224 Woolper Ave", "3333 Burnet Ave", "33333 Burnet Ave", "609 Walnut St")),
Expand All @@ -41,6 +42,6 @@ test_that("addr_match_tiger_street_ranges() works", {
`224 Woolper Avenue` = 1L,
`3333 Burnet Avenue` = 2L,
`33333 Burnet Avenue` = 1L,
`609 Walnut Street` = NULL
`609 Walnut Street` = 2L
))
})

0 comments on commit e337692

Please sign in to comment.