Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plot variance of predicted values after imputation #97

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

KyuriP
Copy link

@KyuriP KyuriP commented Mar 23, 2023

gerko's new var plot idea

@hanneoberman
Copy link
Member

@KyuriP Thank you for your contribution! As discussed, I've implemented your code into the existing ggmice funtion plot_variance(). To maintain flexibility wrt different analyses in mira objects, the observed data is not plotted. Instead, the row number is plotted on the y axis, just as the mids objects when visualized with this function:

library(ggmice)
imp <- mice::mice(mice::nhanes, printFlag = FALSE)
plot_variance(imp)

Created on 2023-03-27 with reprex v2.0.2

library(ggmice)
imp <- mice::mice(mice::nhanes, printFlag = FALSE)
fit <- mice:::with.mids(imp, lm(bmi~age))
plot_variance(fit)

Created on 2023-03-27 with reprex v2.0.2

Adjustments/suggestions are welcome! (cc @gerkovink )

@hanneoberman
Copy link
Member

hanneoberman commented Mar 29, 2023

use broom to extract residuals and plot the predicted values against the observed data (and average imputed data) instead

@hanneoberman
Copy link
Member

Thanks a bunch!! I do have two questions still:

  • why is the scale of the variability categorical, and not continuous?
  • are the warnings expected behaviour?
library(ggmice)
library(mice)
#> 
#> Attaching package: 'mice'
#> The following objects are masked from 'package:ggmice':
#> 
#>     bwplot, densityplot, stripplot, xyplot
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind
mira <- with(mice(nhanes, print = FALSE), lm(bmi~chl))
plot_variance(mira)
#> Warning: Removed 9 rows containing missing values (`geom_point()`).

Created on 2023-04-11 with reprex v2.0.2

@gerkovink
Copy link
Member

One issue. This function does not allow for the mild workflow (e.g. purrr:map()) advocated here.

library(mice, warn.conflicts = FALSE)
library(ggmice, warn.conflicts = FALSE)
library(magrittr)
library(purrr)

# mild workflow with purrr:map()
mild_mira <- 
  nhanes %>% 
  mice(print = FALSE) %>% 
  complete("all") %>% 
  map(~.x %$% lm(bmi~chl))
plot_variance(mild_mira)
#> Error in plot_variance(mild_mira): Input is not a Multiply Imputed Data Set of class `mids`/ `mira`. 
#> 
#>          Perhaps function mice::as.mids() can be of use?

This error message is slightly informative, but not sufficiently as it should point towards with_mids(). On the other hand, we also advocate the mapped workflow in mice, so we should allow for the use of that workflow in ggmice.

The with workflow works without fail:

# regular workflow
mira <- with(mice(nhanes, 
                  print = FALSE), 
             lm(bmi~chl))
plot_variance(mira)
#> Warning: Removed 9 rows containing missing values (`geom_point()`).

Now the interesting thing is that both mild_mira and mira have the exact same list structure, minus the call and class info. mice::pool() does not care about this difference, so perhaps we can borrow a solution from that functionality:

# pooling
pool(mira)
#> Class: mipo    m = 5 
#>          term m    estimate         ubar            b            t dfcom
#> 1 (Intercept) 5 20.35823098 1.530174e+01 5.1736784106 2.151015e+01    23
#> 2         chl 5  0.03256681 4.122791e-04 0.0001760493 6.235383e-04    23
#>         df       riv    lambda       fmi
#> 1 11.48917 0.4057326 0.2886272 0.3868209
#> 2 10.00654 0.5124178 0.3388070 0.4404779
pool(mild_mira)
#> Class: mipo    m = 5 
#>          term m    estimate         ubar            b            t dfcom
#> 1 (Intercept) 5 22.77324050 1.472193e+01 4.800490e-01 1.529799e+01    23
#> 2         chl 5  0.01958118 3.950836e-04 2.448063e-05 4.244603e-04    23
#>         df        riv     lambda       fmi
#> 1 20.28439 0.03912930 0.03765585 0.1203159
#> 2 19.30457 0.07435579 0.06920965 0.1526715

Created on 2023-04-12 with reprex v2.0.2

@hanneoberman
Copy link
Member

Nice work, @KyuriP! One more thing: could you maybe change the discrete scale for the variance to a continuous one to match the plot_variance() output for dataframes?

@hanneoberman
Copy link
Member

Okay, now there are just 1 error message in the example and 1 NOTE remaining :)

checking R code for possible problems ... NOTE
  plot_variance: no visible binding for global variable '.'
  plot_variance: no visible binding for global variable 'm'
  plot_variance: no visible binding for global variable '.fitted'
  plot_variance: no visible binding for global variable '.resid'
  plot_variance: no visible binding for global variable 'avg'
  plot_variance: no visible binding for global variable 'observed'
  plot_variance: no visible binding for global variable 'vrn'
  Undefined global functions or variables:
    . .fitted .resid avg m observed vrn

0 errors ✔ | 0 warnings ✔ | 1 note ✖

@hanneoberman
Copy link
Member

Optional: add a 'perfect prediction' line with geom_abline(intercept = 0, slope = 1)?

@hanneoberman
Copy link
Member

Another addition: the vrb argument that all other ggmice functions have!

@hanneoberman hanneoberman changed the title Create plot_fnc.R Plot variance of predicted values after imputation Dec 21, 2023
@hanneoberman
Copy link
Member

Status check: requested edits not implemented. Converting to draft.

@hanneoberman hanneoberman marked this pull request as draft October 10, 2024 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants