writeH5AD fails for very large datasets (> 1.5 million cells) #73

GabrielHoffman · 2022-09-21T00:36:33Z

Hi Luke,
Thanks again for the package, I use it every day!

I have a huge H5AD file of 40k genes and 3.7M cells. I load it into R with readH5AD(...,use_hdf5=TRUE). After QC and filtering I want to write a 1.5M cells to another H5AD file. When I use writeH5AD(sce[,include],outfile) I get a segfault after ~20 minutes. Memory shouldn't be an issue since I requested 576 Gb RAM on my compute node. I managed to solve this by 1) writing the SingleCellExperiment as 4 chunks to separate H5AD files, 2) then using AnnData in python to concatenate the 4 files into a single H5AD.

I am using R 4.2.0 zellkonverter v1.6.5

Have you encountered this issue with large datasets? I wanted to check with you first since creating a reproducible examine I can share will take a substantial amount of work.

Best,
Gabriel

The text was updated successfully, but these errors were encountered:

lazappi · 2022-09-29T13:57:46Z

Hi @GabrielHoffman

That is indeed a large dataset! I think the largest I have ever tried is a few hundred thousand cells. I'm actually fairly impressed you manage to work with it in both R and Python and it's just the conversion that seems to be the issue.

Have you tried running it with verbose = TRUE? That would be helpful for figuring out which part is failing.

lazappi · 2024-10-04T11:59:19Z

@GabrielHoffman I am closing this as old but if you want to discuss it further we can reopen

stemangiola · 2024-11-13T06:02:39Z

I also would be interested in knowing if there is any parallelization or block-size argument that we can use to speed up the saving.

Thanks!

lazappi · 2024-11-13T07:20:27Z

@stemangiola The short answer is not at the moment but it sounds like there could be some discussion so please open another issue if you want to discuss it further.

lazappi changed the title ~~writeH5AD fails for large dataset~~ writeH5AD fails for very large datasets (> 1.5 million cells) Sep 29, 2022

lazappi added the bug Something isn't working label Sep 29, 2022

lazappi closed this as not planned Won't fix, can't repro, duplicate, stale Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

writeH5AD fails for very large datasets (> 1.5 million cells) #73

writeH5AD fails for very large datasets (> 1.5 million cells) #73

GabrielHoffman commented Sep 21, 2022 •

edited

Loading

lazappi commented Sep 29, 2022

lazappi commented Oct 4, 2024

stemangiola commented Nov 13, 2024

lazappi commented Nov 13, 2024

writeH5AD fails for very large datasets (> 1.5 million cells) #73

writeH5AD fails for very large datasets (> 1.5 million cells) #73

Comments

GabrielHoffman commented Sep 21, 2022 • edited Loading

lazappi commented Sep 29, 2022

lazappi commented Oct 4, 2024

stemangiola commented Nov 13, 2024

lazappi commented Nov 13, 2024

GabrielHoffman commented Sep 21, 2022 •

edited

Loading