Use StatsAPI #236

alyst · 2025-01-15T21:32:33Z

One problem that I have run into is that both SEM and StatsAPI define params(), so if the user is using both in the same session (StatsAPI e.g. via Distributions), there would be a collision.
One solution is just to import StatsAPI and use their params() method.
(The context is very similar, however params() in StatsAPI is supposed to return the actual values of the parameters, whereas SEM one returns their names. There is also StatsAPI.coefnames() method.)

The other methods of StatsAPI could be reused as well, e.g. StatsAPI.dof() (instead of SEM.df()), pvalue(), "fit()*, etc.
Overall, that would make SEM.jl better integrated into the Julia's ecosystem of statistical models.

aaronpeikert · 2025-01-24T11:16:55Z

@alyst I want to get a bit more involved/helpfull in SEM.jl and this seems like a issue I could tackle. Any PR of me would probably require a careful eye. Should I take a stab at it?

alyst · 2025-01-24T18:09:47Z

@aaronpeikert Sure, that would be great! I can help you if you need my review.

aaronpeikert · 2025-01-25T11:29:23Z

alyst · 2025-01-25T18:30:17Z

It is a nice plan! Some comments:

StatsAPI.jl support could be implemented in phases. We can classify the items according to these phases
- The phase 1 would be to resolve current name clashes, like params, because if I'm using params from SEM.jl in my script/package, and then I add StatsAPI.jl (or one of the packages that re-export it), the script/package cannot be compiled.
- The phase 2 would be to rename existing SEM.jl functions to match StatsAPI.jl or add synonyms to the StatsAPI.jl functions (e.g. SEM.df does not clash, but it could be StatsAPI.dof)
- The phase 3 would be to implement as much of StatsAPI.jl functions as possible (where it makes sense).
Some specific items:

AbstractSem <: StatisticalModel -- would be nice if some SEM.jl types are derived from StatsAPI, but there is also Sem and SemFit, so that may require more considerations.
It looks like StatsAPI assumes that fitting modifies the existing statistical model (maybe that is inspired by R). That's not ideal from the flow of information PoV and complicates both the internal and the user script logic.
I like how SEM.jl distinguishes between model definition Sem and fitted model SemFit, so SEM.jl may diverge from StatsAPI in that aspect.
SEM.params -> SEM.paramnames or SEM.coefnames: renaming to paramnames makes sense to me. The internals of SEM.jl (e.g. ParamsMatrix) do not rely on parameter IDs, rather their indices.
coef as params synonym: that's "phase 2", so we can decide on it later
confint -- would be nice to have, esp. since the manuscript tutorial has the user code to calculate it, but it would be more convenient to have it in the package
isfitted: Sem (as the model definition) can return false or throw an exception that one has to run fit(Sem); isfitted(SemFit) would be always true
fit: I like the idea that fit(Sem) returns SemFit
pvalue: why it should be independent from Distributions.jl?
score: phase 3
loglikelihood/nullloglikelihood: that's phase 3
nobs: could be synonym to nsamples (the reason SEM.jl uses nsamples is to disambiguate from observed variables). SEM.jl tutorials may use nsamples only, but the code may allow nobs for those familiar with StatsAPI
AICc: that's phase 3
predict: I think predict(SemFit, data) could be used to predict latent variable values for each sample. In fact, the support for predict is in my staging branch. imply/implied is SEM-specific, I think it should not be synonymous to predict.
throw MethodError for other functions: that would be necessary if SEM.jl types are derived from StatisticalModel, but this decision could be postponed until phase 3.

It will also be nice to get the feedback from the StatsAPI.jl maintainers. While they may not have considered use-cases like SEM.jl originally, they may have some ideas how StatsAPI.jl could be tweaked to support it.

Maximilian-Stefan-Ernst mentioned this issue Jan 23, 2025

v0.3.0 #212

Open

22 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use StatsAPI #236

Use StatsAPI #236

alyst commented Jan 15, 2025

aaronpeikert commented Jan 24, 2025

alyst commented Jan 24, 2025

aaronpeikert commented Jan 25, 2025 •

edited

Loading

alyst commented Jan 25, 2025

Use StatsAPI #236

Use StatsAPI #236

Comments

alyst commented Jan 15, 2025

aaronpeikert commented Jan 24, 2025

alyst commented Jan 24, 2025

aaronpeikert commented Jan 25, 2025 • edited Loading

alyst commented Jan 25, 2025

aaronpeikert commented Jan 25, 2025 •

edited

Loading