Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New step: step_dummy_manual() #1085

Open
EmilHvitfeldt opened this issue Feb 8, 2023 · 1 comment
Open

New step: step_dummy_manual() #1085

EmilHvitfeldt opened this issue Feb 8, 2023 · 1 comment
Labels
feature a feature request or enhancement new steps tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day

Comments

@EmilHvitfeldt
Copy link
Member

EmilHvitfeldt commented Feb 8, 2023

There are times where you have a limited number of levels in a factor, and you want to encode them in a specific way. Such as in the example below. Essentially it should perform a left_join() under the hood, but with special care taken with new and missing values.

library(recipes)

my_data <- tibble(
  choice = c("none", "A", "B", "both", "A", "none")
)

encoding <- tribble(
  ~value, ~A, ~B,
  "none",  0,  0,
  "A",     1,  0,
  "B",     0,  1,
  "both",  1,  1
)

recipe(~., data = my_data) %>%
  step_dummy_manual(choice, encoding = encoding) %>%
  prep() %>%
  bake(new_data = NULL)
#> # A tibble: 6 × 3
#>   choice_A    choice_B
#>      <dbl>       <dbl>
#> 1        0           0
#> 2        1           0
#> 3        0           1
#> 4        1           1
#> 5        1           0
#> 6        0           0

Another more realistic example would be if you had compas directions as an factor as we saw in https://www.kaggle.com/competitions/sliced-s01e04-knyna9

where you could do something like this:

encoding <- tribble(
  ~value, ~lon, ~lat,
  "N",       0,  1,
  "E",       1,  0,
  "S",       0, -1,
  "W",      -1,  0
)

weather example:

encoding <- tribble(
  ~value,          ~sun, ~cloud, ~rain,
  "sunny",            1,      0,     0,
  "partly clouded",   1,      1,     0,
  "clouded",          0,      1,     0,
  "raining",          0,      1,     1
)
@EmilHvitfeldt EmilHvitfeldt added new steps feature a feature request or enhancement labels Feb 8, 2023
@EmilHvitfeldt EmilHvitfeldt added the tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day label Jul 16, 2024
@EmilHvitfeldt
Copy link
Member Author

Also a way to handle out of table data, such as enriching. City names could be replaced/enriched with city characteristics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement new steps tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day
Projects
None yet
Development

No branches or pull requests

1 participant