You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Luke,
Thanks again for the package, I use it every day!
I have a huge H5AD file of 40k genes and 3.7M cells. I load it into R with readH5AD(...,use_hdf5=TRUE). After QC and filtering I want to write a 1.5M cells to another H5AD file. When I use writeH5AD(sce[,include],outfile) I get a segfault after ~20 minutes. Memory shouldn't be an issue since I requested 576 Gb RAM on my compute node. I managed to solve this by 1) writing the SingleCellExperiment as 4 chunks to separate H5AD files, 2) then using AnnData in python to concatenate the 4 files into a single H5AD.
I am using R 4.2.0 zellkonverter v1.6.5
Have you encountered this issue with large datasets? I wanted to check with you first since creating a reproducible examine I can share will take a substantial amount of work.
Best,
Gabriel
The text was updated successfully, but these errors were encountered:
lazappi
changed the title
writeH5AD fails for large dataset
writeH5AD fails for very large datasets (> 1.5 million cells)
Sep 29, 2022
That is indeed a large dataset! I think the largest I have ever tried is a few hundred thousand cells. I'm actually fairly impressed you manage to work with it in both R and Python and it's just the conversion that seems to be the issue.
Have you tried running it with verbose = TRUE? That would be helpful for figuring out which part is failing.
@stemangiola The short answer is not at the moment but it sounds like there could be some discussion so please open another issue if you want to discuss it further.
Hi Luke,
Thanks again for the package, I use it every day!
I have a huge H5AD file of 40k genes and 3.7M cells. I load it into R with
readH5AD(...,use_hdf5=TRUE)
. After QC and filtering I want to write a 1.5M cells to another H5AD file. When I usewriteH5AD(sce[,include],outfile)
I get a segfault after ~20 minutes. Memory shouldn't be an issue since I requested 576 Gb RAM on my compute node. I managed to solve this by 1) writing theSingleCellExperiment
as 4 chunks to separate H5AD files, 2) then usingAnnData
in python to concatenate the 4 files into a single H5AD.I am using
R 4.2.0 zellkonverter v1.6.5
Have you encountered this issue with large datasets? I wanted to check with you first since creating a reproducible examine I can share will take a substantial amount of work.
Best,
Gabriel
The text was updated successfully, but these errors were encountered: