Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multilevel imputation does not accept character or factor variable as the cluster variable; must be integer #657

Open
isaactpetersen opened this issue Jul 31, 2024 · 1 comment

Comments

@isaactpetersen
Copy link

Multilevel imputation does not appear to accept a character or factor variable as the cluster variable. It appears that the cluster variable must be integer. Note, when using 2l.pmm/miceadds, I receive the same error as documented in the MICE discussion here, so the reproducible example below could potentially explain why those users were experiencing the issue.

Here is a reprex (adapted from the MICE vignette here):

library("mice")
#> 
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind
library("miceadds")
#> * miceadds 3.17-44 (2024-01-08 19:08:24)

# D
con <- url("https://www.gerkovink.com/mimp/popular.RData")
load(con)

dataToImpute <- popNCR2

# Specify variables to impute
Y <- "popular"

# Imputation method
meth <- make.method(dataToImpute)
meth[1:length(meth)] <- ""

# Specify predictor matrix
pred <- make.predictorMatrix(dataToImpute)
pred[1:nrow(pred), 1:ncol(pred)] <- 0
pred[Y, "class"] <- (-2) # cluster variable
pred[Y, "extrav"] <- 1 # fixed effect predictor
diag(pred) <- 0

pred
#>          pupil class extrav sex texp popular popteach
#> pupil        0     0      0   0    0       0        0
#> class        0     0      0   0    0       0        0
#> extrav       0     0      0   0    0       0        0
#> sex          0     0      0   0    0       0        0
#> texp         0     0      0   0    0       0        0
#> popular      0    -2      1   0    0       0        0
#> popteach     0     0      0   0    0       0        0

# Character
dataToImpute$class <- as.character(dataToImpute$class)

meth[Y] <- "2l.norm"
imp1 <- mice(dataToImpute, pred = pred, meth = meth, maxit = 5, print = FALSE)
#> Error in mice.impute.2l.norm(y = c(6.3, 4.9, 5.3, 4.7, 4.5, 4.7, 5.9, : No class variable

meth[Y] <- "2l.pmm"
imp2 <- mice(dataToImpute, pred = pred, meth = meth, maxit = 5, print = FALSE)
#> Error in str2lang(x): <text>:1:24: unexpected ')'
#> 1: dv._lmer ~ 1+extrav+(1|)
#>                            ^

# Factor
dataToImpute$class <- as.factor(dataToImpute$class)

meth[Y] <- "2l.norm"
imp3 <- mice(dataToImpute, pred = pred, meth = meth, maxit = 5, print = FALSE)
#> Error in check.cluster(data, predictorMatrix): Convert cluster variable class to integer by as.integer()

meth[Y] <- "2l.pmm"
imp4 <- mice(dataToImpute, pred = pred, meth = meth, maxit = 5, print = FALSE)
#> Error in check.cluster(data, predictorMatrix): Convert cluster variable class to integer by as.integer()

# Integer
dataToImpute$class <- as.integer(dataToImpute$class)

meth[Y] <- "2l.norm"
imp5 <- mice(dataToImpute, pred = pred, meth = meth, maxit = 5, print = FALSE)

meth[Y] <- "2l.pmm"
imp6 <- mice(dataToImpute, pred = pred, meth = meth, maxit = 5, print = FALSE)

sessionInfo()
#> R version 4.3.1 (2023-06-16 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 11 x64 (build 22631)
#> 
#> Matrix products: default
#> 
#> 
#> Random number generation:
#>  RNG:     Mersenne-Twister 
#>  Normal:  Inversion 
#>  Sample:  Rounding 
#>  
#> locale:
#> [1] LC_COLLATE=English_United States.utf8 
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> time zone: America/Chicago
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] miceadds_3.17-44 mice_3.16.0     
#> 
#> loaded via a namespace (and not attached):
#>  [1] utf8_1.2.4        generics_0.1.3    tidyr_1.3.1       shape_1.4.6.1    
#>  [5] lattice_0.22-6    lme4_1.1-35.5     digest_0.6.36     magrittr_2.0.3   
#>  [9] mitml_0.4-5       evaluate_0.24.0   grid_4.3.1        iterators_1.0.14 
#> [13] fastmap_1.2.0     foreach_1.5.2     jomo_2.7-6        glmnet_4.1-8     
#> [17] Matrix_1.6-5      nnet_7.3-19       backports_1.5.0   DBI_1.2.3        
#> [21] survival_3.7-0    purrr_1.0.2       fansi_1.0.6       codetools_0.2-20 
#> [25] cli_3.6.3         mitools_2.4       rlang_1.1.4       splines_4.3.1    
#> [29] reprex_2.1.1      withr_3.0.0       yaml_2.3.10       pan_1.9          
#> [33] tools_4.3.1       nloptr_2.1.1      minqa_1.2.7       dplyr_1.1.4      
#> [37] boot_1.3-30       broom_1.0.6       vctrs_0.6.5       R6_2.5.1         
#> [41] rpart_4.1.23      lifecycle_1.0.4   fs_1.6.4          MASS_7.3-60.0.1  
#> [45] pkgconfig_2.0.3   pillar_1.9.0      glue_1.7.0        Rcpp_1.0.13      
#> [49] xfun_0.46         tibble_3.2.1      tidyselect_1.2.1  rstudioapi_0.16.0
#> [53] knitr_1.48        htmltools_0.5.8.1 nlme_3.1-165      rmarkdown_2.27   
#> [57] compiler_4.3.1

Created on 2024-07-31 with reprex v2.1.1

@stefvanbuuren
Copy link
Member

Thanks for your note. This is indeed a problem case that is not correctly caught.

The problem is caused by the automatic removal of character variables at initialization. mice writes a message of such removals to the loggedEvents. However, we never see these messages because the program crashes and does not return a mids object.

More generally, the handling of cluster variables could be improved, and better support could be provided for factor, character, integer and numeric cluster variables.

Something for the wish list. Not a priority for me right now, but I'd be happy to take any pull requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants