Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle large matrices and change default Seurat format #21

Merged
merged 5 commits into from
Dec 18, 2024

Conversation

jashapiro
Copy link
Member

When testing AlexsLemonade/OpenScPCA-analysis#953, I found that the sum counts function doesn't work with very large matrices, namely unfiltered matrices. We probably don't really need to support that, but it wasn't too hard to do, so I added a bit of splitting and merging to avoid ever creating a single very large non-sparse matrix.

(There was a fun edge case I ran into when the matrix was one larger than a multiple of the chunk size...)

After some more Seurat testing, I also discovered that v5 is really the default for new objects, so we might as well make that our default too. So there are bunch of test changes related to that change in defaults.

@jashapiro jashapiro requested a review from sjspielman December 17, 2024 20:08
Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me, and I like the strategy you used to split up matrices without actually splitting them up 🤙

The only comment I have is whether it's worth including a large matrix to test with, since our tests clearly didn't cover that exactly. Our test data here is 100 cells, and I guess we could "multiply" it by 50 (51?) and unique-ify barcodes along the way. That said, we don't actually test values of summed counts in the first place here, so this would require several more tests as a whole. I'm kind of on the fence for whether this is necessary, so I'm going to approve this and if you think it's a good idea to circle back to, then we can open an issue for future us.

@jashapiro
Copy link
Member Author

I added an argument to set the cell chunk size, which made it easy to add some extra tests without making huge matrices.

@jashapiro jashapiro merged commit 2deaa20 into main Dec 18, 2024
2 checks passed
@jashapiro jashapiro deleted the jashapiro/largematrix branch December 18, 2024 16:38
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐰
i was typing for a tada emoji but accidentally typed ra... , so enjoy this rabbit who likes the new tests!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants