Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapters 9 & 11 Fail to Knit #10

Open
alex-gable opened this issue Dec 21, 2020 · 7 comments
Open

Chapters 9 & 11 Fail to Knit #10

alex-gable opened this issue Dec 21, 2020 · 7 comments

Comments

@alex-gable
Copy link

alex-gable commented Dec 21, 2020

Problem

Chapter 9 and 11 are failing to knit due to changes in dplyr(1.0) or broom(0.7)

I consulted this StackOverflow post for guidance (and provided my own solution) in order to solve the error which occurs in the below locations.

Example Solution

Looking at the documentation for do(), it appears to have been superseded with a recommendation to use nest_by(). Conveniently, the documentation examples cover almost this exact use case (see details)

# do() with named arguments becomes nest_by() + mutate() & list()
models <- by_cyl %>% do(mod = lm(mpg ~ disp, data = .))
# ->
models <- mtcars %>%
  nest_by(cyl) %>%
  mutate(mod = list(lm(mpg ~ disp, data = data)))
models %>% summarise(rsq = summary(mod)$r.squared)

# use broom to turn models into data
models %>% do(data.frame(
  var = names(coef(.$mod)),
  coef(summary(.$mod)))
)
# ->
if (requireNamespace("broom")) {
  models %>% summarise(broom::tidy(mod))
}
For the chunk containing errors on lines 400-414:

regressions <- smallchart.long %>% 
  nest_by(schoolid) %>% 
  mutate(fit = list(lm(MathAvgScore ~ year08, data=data)))

sd_filter <- smallchart.long %>%
  group_by(schoolid) %>%
  summarise(sds = sd(MathAvgScore)) 

regressions <- regressions %>%
  right_join(sd_filter, by="schoolid") %>%
  filter(!is.na(sds))

lm_info1 <- regressions %>%
  summarise(tidy(fit)) %>%
  ungroup() %>%
  select(schoolid, term, estimate) %>%
  spread(key = term, value = estimate) %>%
  rename(rate = year08, int = `(Intercept)`)

lm_info2 <- regressions %>%
  summarise(tidy(fit)) %>%
  ungroup() %>%
  select(schoolid, term, std.error) %>%
  spread(key = term, value = std.error) %>%
  rename(se_rate = year08, se_int = `(Intercept)`)

lm_info <- regressions %>%
  summarise(glance(fit)) %>%
  ungroup() %>%
  select(schoolid, r.squared, df.residual) %>%
  inner_join(lm_info1, by = "schoolid") %>%
  inner_join(lm_info2, by = "schoolid") %>%
  mutate(tstar = qt(.975, df.residual), 
         intlb = int - tstar * se_int, 
         intub = int + tstar * se_int,
         ratelb = rate - tstar * se_rate, 
         rateub = rate + tstar * se_rate)

This solution can nearly be line-for-lined copy for the errors occurring on lines 461-475.

Chapter 9 also has an issue here knitting due to failure to converge. Using 500 iterations seemed to do the trick:

hcs.lme=lme(MathAvgScore ~ year08 * charter, chart.long, 
  random =  ~ 1 | schoolid, na.action=na.exclude,
  correlation=corCompSymm(form = ~ 1 |schoolid), 
  weights=varIdent(form = ~1|year08), control = lmeControl(msMaxIter=500))

summary(hcs.lme)                                                                                                                                                                                   
# Linear mixed-effects model fit by REML
#   Data: chart.long 
#       AIC     BIC  logLik
#   10299.2 10348.3 -5140.6
# 
# Random effects:
#  Formula: ~1 | schoolid
#         (Intercept) Residual
# StdDev: 0.002264717 6.534915
# 
# Correlation Structure: Compound symmetry
#  Formula: ~1 | schoolid 
#  Parameter estimate(s):
#      Rho 
# 0.8209145 
# Variance function:
#  Structure: Different standard deviations per stratum
#  Formula: ~1 | year08 
#  Parameter estimates:
#        0        1        2 
# 1.000000 1.127902 1.079423 
# Fixed effects:  MathAvgScore ~ year08 * charter 
#                   Value Std.Error   DF   t-value p-value
# (Intercept)    652.3347 0.2828597 1113 2306.2126  0.0000
# year08           1.1831 0.0907869 1113   13.0320  0.0000
# charter         -5.9106 0.8611940  616   -6.8633  0.0000
# year08:charter   0.8316 0.3032040 1113    2.7426  0.0062
#  Correlation: 
#                (Intr) year08 chartr
# year08         -0.208              
# charter        -0.328  0.068       
# year08:charter  0.062 -0.299 -0.308
# 
# Standardized Within-Group Residuals:
#        Min         Q1        Med         Q3        Max 
# -4.9760770 -0.4490767  0.0865079  0.5669240  3.0970658 
# 
# Number of Observations: 1733
# Number of Groups: 618 

hcs.lme$modelStruct                                                                                                                                                                                
# reStruct  parameters:
#  schoolid 
# -7.967465 
# corStruct  parameters:
# [1] 1.998216
# varStruct  parameters:
# [1] 0.1203593 0.0764270

anova(hcs.lme,cs.lme)   # hcs not converging here                                                                                                                                                  
#         Model df      AIC      BIC    logLik   Test  L.Ratio p-value
# hcs.lme     1  9 10299.20 10348.30 -5140.600                        
# cs.lme      2  7 10315.94 10354.13 -5150.973 1 vs 2 20.74528  <.0001

Finally, in Chapter 11, there's a missing library(broom) and a handful of unscoped select() calls needing dplyr:: prefixed.

Hope this unsolicited help is, well, helpful!

@alex-gable alex-gable changed the title Chapter 9 Fails to Knit Chapters 9 & 11 Fail to Knit Dec 21, 2020
@proback
Copy link
Owner

proback commented Jan 14, 2021

Thanks much - this is very helpful! Because we had to freeze R package versions many months ago when the production process started, there will be inevitable issues with package updates. For now, I added a section to the Preface indicating which versions of which packages we used for this edition of the textbook, but I will definitely make your suggested changes in the next edition (or in periodic code updates).

@raffaem
Copy link

raffaem commented Feb 21, 2021

@alex-gable Can you make a PR with this?

@proback Can we merge this? I am not able to compile the book in PDF

@proback
Copy link
Owner

proback commented Feb 22, 2021

@raffaem If you use the package versions listed in the preface are you able to compile the book?

@alex-gable
Copy link
Author

alex-gable commented Feb 24, 2021

@raffaem trying to be mindful/respectful of the fact that this is not my work, I've put the changes I've made in alex-gable/BeyondMLR@b96ab33. the content blocks you're looking for are in chapters 6, 9, 11 in that repo. What's relatively opaque amongst the changes, and only alluded to above, is the addition of new_session = TRUE to render_book in knit.R. The aforementioned change caused some of the above changes I recommended and made in my branch.

@proback want to double emphasize that I want to make sure I'm not breaking any rules in that branch (I've .gitignore'd any bookdown outputs) and would love to contribute back anything I can. let me know if there's anything you'd like me to change

for comparison's sake, I've added my current packages as used in the project. renv might be a super easy way to track these. I used it's dependencies method to do project introspection and compile this list.

@napaxton
Copy link

napaxton commented Apr 22, 2021

Once you make the changes above (and in Issue #12 ), it almost all works. Having a problem with the following at lines 518-28 in Chapter 11:

regressions <- refdata %>% 
  group_by(game) %>% 
  mutate(fit = list(glm(foul.home ~ foul.diff, family = binomial, 
               data = .))) 

glm_info <- regressions %>%
  summarise(tidy(fit)) %>%
  ungroup() %>%
  dplyr::select(game, term, estimate) %>%
  spread(key = term, value = estimate) %>%
  rename(rate = foul.diff, int = `(Intercept)`)

This causes the following error:

Error: Problem with `summarise()` input `..1`.
x No tidy method recognized for this list.
ℹ Input `..1` is `tidy(fit)`.
ℹ The error occurred in group 1: game = 1.

Major difference would seem to be some difference between lm() and glm()? Any other ideas? And how can we solve it?

@napaxton
Copy link

napaxton commented Apr 23, 2021

OK, seemed to have solved it, in that the R code will compile. Just needed to follow the rewrite in gable's Chap 11 revs more precisely.

Still having problems with compile from Rmd to HTML/PDF. Throwing up hands for now and returning to this later.

@proback
Copy link
Owner

proback commented Jul 22, 2022

I'm sorry for my silence - I've gotten pulled in several other directions over the past year. My new goal is to make a series of corrections and additions that I've been accumulating by the end of January 2023 (when I might actually have a small break to focus on this), possibly using quarto. Feel free to share any other suggestions you'd have before then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants