Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make recipe() works with long formulas #1283

Merged
merged 14 commits into from
May 24, 2024
Merged

Conversation

EmilHvitfeldt
Copy link
Member

@EmilHvitfeldt EmilHvitfeldt commented Feb 2, 2024

I learned about all.names() a couple of days ago. And it works wonders in recipes!

to close #1279

The previous issues we had with long formulas was that fun_calls() was recursive, and it just couldn't handle that large formulas. People were seeing issues with as low as 200 columns.

df <- matrix(1:10000, ncol = 10000)
df <- as.data.frame(df)
names(df) <- c(paste0("x", 1:10000))

library(recipes)
long_formula <- as.formula(paste("~ ", paste(names(df), collapse = " + ")))

rec <- recipe(long_formula, df)

rec |> prep() |> bake(df)
#> # A tibble: 1 × 10,000
#>      x1    x2    x3    x4    x5    x6    x7    x8    x9   x10   x11   x12   x13
#>   <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1     1     2     3     4     5     6     7     8     9    10    11    12    13
#> # ℹ 9,987 more variables: x14 <int>, x15 <int>, x16 <int>, x17 <int>,
#> #   x18 <int>, x19 <int>, x20 <int>, x21 <int>, x22 <int>, x23 <int>,
#> #   x24 <int>, x25 <int>, x26 <int>, x27 <int>, x28 <int>, x29 <int>,
#> #   x30 <int>, x31 <int>, x32 <int>, x33 <int>, x34 <int>, x35 <int>,
#> #   x36 <int>, x37 <int>, x38 <int>, x39 <int>, x40 <int>, x41 <int>,
#> #   x42 <int>, x43 <int>, x44 <int>, x45 <int>, x46 <int>, x47 <int>,
#> #   x48 <int>, x49 <int>, x50 <int>, x51 <int>, x52 <int>, x53 <int>, …

Created on 2024-05-23 with reprex v2.1.0

@EmilHvitfeldt
Copy link
Member Author

Some of these CI failure also appear on dev branch

predictor_names <- names(attr(data_info, "dataClasses"))
if (length(response_info) > 0 && all(response_info > 0)) {
predictor_names <- predictor_names[-response_info]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are allowed to do this refactor because get_rhs_vars() is only ever called after inline_check() in form2args()

https://github.com/tidymodels/recipes/blob/main/R/recipe.R#L233-L241

the the question in essence because:

  1. if formula contains a ., use "all names" otherwise use all.names() to pull them out.
  2. pull names out of lhs
  3. take the diff if needed

})

test_that("recipe() can handle very long formulas (#1283)", {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is simply here to demonstrate that we doesn't get a Error: C stack usage 7954280 is too close to the limit error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, that's a lot of columns. Hell yeah!

@EmilHvitfeldt EmilHvitfeldt requested a review from simonpcouch May 24, 2024 00:27
@EmilHvitfeldt EmilHvitfeldt changed the title rewrite fun_calls() to not use recursion make recipe() works with long formulas May 24, 2024
Copy link
Contributor

@simonpcouch simonpcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so nice! Leaving review as "Comment" rather than "Approve" just so I can revisit and make sure I understand the situations where this is applied.

NEWS.md Outdated Show resolved Hide resolved
R/misc.R Show resolved Hide resolved
R/recipe.R Outdated Show resolved Hide resolved
R/recipe.R Outdated Show resolved Hide resolved
R/recipe.R Outdated Show resolved Hide resolved
tests/testthat/test-basics.R Show resolved Hide resolved
})

test_that("recipe() can handle very long formulas (#1283)", {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, that's a lot of columns. Hell yeah!

R/misc.R Outdated Show resolved Hide resolved
@EmilHvitfeldt EmilHvitfeldt requested a review from simonpcouch May 24, 2024 17:07
Copy link
Contributor

@simonpcouch simonpcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@EmilHvitfeldt EmilHvitfeldt merged commit 922ca84 into main May 24, 2024
9 checks passed
@EmilHvitfeldt EmilHvitfeldt deleted the allow-super-long-formulas branch May 24, 2024 17:19
Copy link

github-actions bot commented Jun 8, 2024

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jun 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Long formulas cause recipe() to error due to C stack usage
2 participants