Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement data preloading #361

Merged
merged 4 commits into from
Feb 13, 2025
Merged

Conversation

kylelang
Copy link
Contributor

This PR makes three meaningful and two trivial adjustments.

Meaningful

  1. Implement data preloading for all analyses
  2. Update unit tests to work with data preloading and typed data
    • Adding a typed RDS file "tests/testthat/testData.rds" to make life easier
  3. Adding two workaround functions in "R/commonClassicalRegression.R" to harmonize the column names stored in options$modelTerms.
    • These names are not properly coded when the options list is parsed.
      • I.e., they still get the old encoding that doesn't provide type information.
    • I suspect that the modelTerms element may be processed too early (i.e., before the column names in dataset are properly encoded), but that's pure speculation.
    • The workaround functions simply retrieve the correctly encoded names from dataset by decoding both the modelTerms names and column names in dataset back to the user-supplied versions and mapping from there.
    • If the encoding issue is fixed in jaspBase, this hack won't be necessary, but it works for now and doesn't seem to cause any problems.

Trivial

  1. Adjust the loop index in .computeCorBayes() to only calculate the stats requested by the user.
  2. Changed the data used for the Kendall's Tau tests in Bayesian Correlation.
    • These tests were persistently crashing somewhere in bstat.
    • The putative effect size in the crashy tests was really small, and that small effect looked to be causing the convergence problems.
    • The tests ran fine when using data with a larger effect, so I changed the data.
    • Since it looks like nobody has touched bstat or the Bayesian correlation analyses in three years, I didn't think it was worth the effort of trying to get the sampler to play nice with the small effect.

added a workaround to harmonize the column encoding for the names in 'modelTerms'
changed the example data used for the kendall's tau tests in bayesian correlation because bstat kept crashing due to the very small effect size in the old example data (at least, that's my hypothesis)
@kylelang kylelang marked this pull request as ready for review January 16, 2025 00:30
@kylelang
Copy link
Contributor Author

@JorisGoosen, I added you as a reviewer so you could see what I did about the encoding issue we discussed on Tuesday. Specifying the allowed types in the QML and setting allowTypeChange: true had no effect.

@kylelang kylelang requested a review from vandenman February 7, 2025 13:38
@kylelang
Copy link
Contributor Author

kylelang commented Feb 7, 2025

@vandenman, I haven't been able to get Johnny's attention (via Mattermost or email). Are you still willing to review this PR?

@@ -63,7 +63,8 @@ Form
availableVariablesList.source: ['covariates', 'factors']
startIndex: 0
availableVariablesListName: "availableTerms"
allowedColumns: []
allowedColumns: ['scale', 'nominal']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add these specifications? The column type specification already happens in the main menu, where the user specifies either continuous (covariates) or factors (categorical) predictors, and those types are then respected in the model box. Allowing the type to change seems a bit weird, also since it does not get changed in all models and therefore causing non-nested model comparisons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is a vestige of an attempt to fix the variable name encoding problems that I ended up working around with the routines defined in "R/commonClassicalRegression.R". That attempt didn't work, but I forgot to undo this change.

@JohnnyDoorn
Copy link
Contributor

@kylelang great changes! just two things:

  • the type specification is not necessary I believe
  • Poisson regression is not working in GLM :
image

forcing conversion to scale for DV in poisson regression
@kylelang
Copy link
Contributor Author

kylelang commented Feb 10, 2025

The crash in Poisson regression was caused by the error-checking routines trying to analyze the factor DV as a numeric variable. Since the data preloading types the variables before they get to R, a nominal or ordinal DV gets to R as a factor.

Since we always want the DV in Poisson regression to be an integer (I think), it seems cleanest to force the appropriate type conversion in the QML form. That's what I've implemented in my latest commit. We could also do the type conversion in R, but it feels simpler and more transparent to do the conversion where the user can see what's happening.

@kylelang
Copy link
Contributor Author

@JohnnyDoorn, this PR is ready for re-review.

@JohnnyDoorn JohnnyDoorn merged commit ab90540 into jasp-stats:master Feb 13, 2025
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants