Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Steps for features aggregation #988

Closed
abichat opened this issue May 24, 2022 · 6 comments
Closed

Steps for features aggregation #988

abichat opened this issue May 24, 2022 · 6 comments
Labels
extension package feature request that would be more appreciate to implement as an extension package

Comments

@abichat
Copy link
Contributor

abichat commented May 24, 2022

Feature

In situations when there are lot of correlated features, like in transciptomics data, it is usual to create aggregated score for each group of correlated variables (sometimes called modules, sets, metagenes...). These summary scores could be computed as a mean, a z-scores, a eigenvalue... of the features in the module. There are some examples of scores here.
To complete, modules can be obtained by other ways (other steps), like given by an a priori knowledge or WGCNA algorithm.

What do you think? This new steps could be included in recipes, embed or a new package dedicated to feature aggregation.

Thank you for all your work!

@EmilHvitfeldt
Copy link
Member

EmilHvitfeldt commented Jun 8, 2022

As far as I can tell is something you could put into a recipe steps, and there would be some need to do this because there are domains that use these methods a lot. I don't think such methods would be appropriate in {recipes}, they have some {embed} flavor but right now I feel (not strongly) they would go best in a feature aggregation extension package.

I would need to do more reading to make sure, but an essential part of a recipe step is to be able to reapply the same transformation that was applied to the training data set to other datasets, and I'm not sure if these methods are re-apply-able.

@EmilHvitfeldt EmilHvitfeldt added the extension package feature request that would be more appreciate to implement as an extension package label Mar 30, 2023
@abichat
Copy link
Contributor Author

abichat commented Mar 12, 2024

Two years later, I've finally created a package that incorporates these types of steps (but not only, it's dedicated to omics data).
Right now, there are only aggregation based on prior knowledge (from a list) or on hierarchical clustering (from hclust).
https://github.com/abichat/scimo

@EmilHvitfeldt
Copy link
Member

@abichat that looks very interesting! thank you for cross posting. I'll do my best to take a look at the package next week!

@EmilHvitfeldt
Copy link
Member

This is very exciting!

I'm going to close this issue as I think this is a good solution to problem raised in this issue.

Feel free to add any issues here for changes in {recipes} that would make your life easier. I'm going to add a couple of issues with some comments I have

@abichat
Copy link
Contributor Author

abichat commented Mar 28, 2024

Thank you so much for your time and feedback!

Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
extension package feature request that would be more appreciate to implement as an extension package
Projects
None yet
Development

No branches or pull requests

2 participants