Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An unknown error #142

Open
xiadawei123 opened this issue Nov 14, 2023 · 34 comments
Open

An unknown error #142

xiadawei123 opened this issue Nov 14, 2023 · 34 comments

Comments

@xiadawei123
Copy link

Hi

I am using the software chromap developed by you for map HiC reads, but an error occurred during the alignment. I hope to get your help. The following is my code and error, thank you

conda create -n chromap_yahs -c bioconda -c conda-forge chromap samtools yahs samtools assembly-stats openjdk
samtools faidx contig.fa
chromap -i -r contig.fa -o index -w 14 -k 27
nohup chromap --preset hic -r contig.fa -x index --remove-pcr-duplicates -1 R1.fastq -2 R2.fastq --SAM -o aligned.sam -t 90 &

30d169a9e3bd9b70c96e1f7c8eed0b4
@haowenz
Copy link
Owner

haowenz commented Nov 14, 2023

It looks like some temp output files are missing. Can you check and make sure you have enough disk space for output files?

@xiadawei123
Copy link
Author

It looks like some temp output files are missing. Can you check and make sure you have enough disk space for output files?

Ok,
Thank you for your prompt reply. I will increase the memory usage and try again.

@xiadawei123
Copy link
Author

It looks like some temp output files are missing. Can you check and make sure you have enough disk space for output files?

Hi, after ensuring that the server has sufficient memory (1.4T) and disk space, I reran the Hi-C alignment program from chromap. However, I am still encountering the same error. I'm not sure whether it's due to insufficient memory or some other reason. Could you provide me with some assistance? Thank you.

@haowenz
Copy link
Owner

haowenz commented Nov 20, 2023

As I mentioned, you have too many reads and thus you need to make sure there is enough disk space for your output. The error message indicates that you don't. Memory is not related at all.

@haowenz
Copy link
Owner

haowenz commented Nov 20, 2023

It would be great if you can check if you have enough disk for your output. If not, maybe delete some of your old files and make enough space for the output. Increasing memory is not helpful in this case.

@haowenz
Copy link
Owner

haowenz commented Nov 20, 2023

Besides, why did you use -k 27? Did the default value work?

@xiadawei123
Copy link
Author

Besides, why did you use -k 27? Did the default value work?

I have 47T disk, I think there should be enough space, is there any other reason?
Since the genome is close to 10 G, I see your previous advice to others is to increase the k setting. Of course, I also used the default k parameter, but I still got the same error.

@haowenz
Copy link
Owner

haowenz commented Nov 20, 2023

I see. Can you run some command line to check your available disk space? I forget the exact command line. It might be "du -sh" or something else.

@xiadawei123
Copy link
Author

I see. Can you run some command line to check your available disk space? I forget the exact command line. It might be "du -sh" or something else.

Yes, I often use du-sh or df -h to check the disk space, and I reserved 47T of space for chromap Hic comparison. Thank you very much for your reply. I will run it again and finally check whether all 47T is used up

@xiadawei123
Copy link
Author

I see. Can you run some command line to check your available disk space? I forget the exact command line. It might be "du -sh" or something else.

Hi,There's plenty of disk space, so I don't think it's a disk space related problem. If you have some ideas to solve it, please let me know, thank you

@haowenz
Copy link
Owner

haowenz commented Nov 22, 2023

This is weird. After the run, did you check if the temporary mapping files are in the output dir? You may run "ls" and see if they are there. And can you remove "--remove-pcr-duplicates" in the command line? I guess it is not very useful for hi-c? How many sequences are there in your contig.fa files?

@haowenz
Copy link
Owner

haowenz commented Nov 22, 2023

Besides, can you show the beginning of your log?

@xiadawei123
Copy link
Author

This is weird. After the run, did you check if the temporary mapping files are in the output dir? You may run "ls" and see if they are there. And can you remove "--remove-pcr-duplicates" in the command line? I guess it is not very useful for hi-c? How many sequences are there in your contig.fa files?

Yes, I also find it very strange. The program successfully generated a large number of temporary files, each of which was approximately 1GB in size. Adding "--remove-pcr-duplicates" was because I needed to use the sam file obtained from "chromap" as an input for software YaHs for chromosome buliding, and YaHs emphasized in its instructions that the sam file needed to remove pcr-duplicates . As shown below, I have displayed some of the tempposrary files and the beginning of the log file. If you have any additional suggestions, please let me know in a timely manner. Thank you again for your response.

image
image

@haowenz
Copy link
Owner

haowenz commented Nov 23, 2023

The error message indicates that chromap was trying to open a temp mapping file but nothing is found. Initially, I was assuming your disk space was full and temp mapping files were not able to be generated and thus cannot be opened. But it seems that this is not the case. From the log, I didn't see errors.

This is hard to debug on my side as it is hard for us to reproduce the error. If the dataset is publicly available, we can download it and try it. Otherwise, we have to change the code a little bit to let it generate more error message and ask you to try it again so that we can understand what exactly happened. Or you can use bwa-mem for your pipeline. It would be much much slower than Chromap in this case but it might work.

@xiadawei123
Copy link
Author

The error message indicates that chromap was trying to open a temp mapping file but nothing is found. Initially, I was assuming your disk space was full and temp mapping files were not able to be generated and thus cannot be opened. But it seems that this is not the case. From the log, I didn't see errors.

This is hard to debug on my side as it is hard for us to reproduce the error. If the dataset is publicly available, we can download it and try it. Otherwise, we have to change the code a little bit to let it generate more error message and ask you to try it again so that we can understand what exactly happened. Or you can use bwa-mem for your pipeline. It would be much much slower than Chromap in this case but it might work.

Thanks again for your timely reply, we have simultaneously used multiple methods for chromosome construction, including bwa mem. Yesterday, I replaced a server with better performance and tried to run chromap. If there is any problem, I will give you feedback in time.

@huang-0323
Copy link

I have the same issue and I'm sure my disk space is enough, may I inquire if there has been any progress or resolution to the matter? I appreciate your time and assistance.

@haowenz
Copy link
Owner

haowenz commented Nov 28, 2023

Can you provide your log?

@huang-0323
Copy link

huang-0323 commented Nov 28, 2023

image It creates a bunch of temp files and the log shows
image
my command line is
nohup chromap --preset hic -r /home/data3/hsh/genome/maguan_goat_assembly/02.genome_with_hic_hifiasm/M11/M11.hic.p_ctg.fa -x /home/data3/hsh/genome/maguan_goat_assembly/02.genome_with_hic_hifiasm/M11/M11.hic.p_ctg.index --remove-pcr-duplicates -1 /home/data3/hsh/genome/maguan_goat_assembly/03.scaffold/M11/fastq/M11_hic_merge_R1.fastq.gz -2 /home/data3/hsh/genome/maguan_goat_assembly/03.scaffold/M11/fastq/M11_hic_merge_R2.fastq.gz --SAM -o M11.hic.aligned.sam -t 100 >> M11_chromap_align.log 2>&1 &

@ghost
Copy link

ghost commented Jan 4, 2024

Hello, is this issue solved? I also encounterd the similar issue, and I suppose that it may be caused by the size of temp files. The memory of my server is 1.5T and the free disk size is 15T. Did you check the tempMappingFileHandle module(temp_mapping.h),maybe it's too big to handle it.

@huang-0323
Copy link

Hello, is this issue solved? I also encounterd the similar issue, and I suppose that it may be caused by the size of temp files. The memory of my server is 1.5T and the free disk size is 15T. Did you check the tempMappingFileHandle module(temp_mapping.h),maybe it's too big to handle it.

not yet, I think is SAM output function has an error, other output option(--BED/--TagAlign) works fine.

@haowenz
Copy link
Owner

haowenz commented Jan 5, 2024

@xiadawei123 Were you able to run Chromap as you mentioned?

If any of you are using publicly available datasets, please let me know, I can try to reproduce the error. It is impossible to just debug only with these error messages.

@utpala101
Copy link

@xiadawei123 Hi, have you solved the problem? I have the same problem with a relatively smaller genome with the size of 4G, and the disk size in enough to run it.

@xiadawei123
Copy link
Author

xiadawei123 commented Apr 30, 2024 via email

@mourisl
Copy link
Collaborator

mourisl commented Apr 30, 2024

@utpala101 In the new version, Chromap will print an error message on which temp file it tries to open. This may help find some debug information. Did the same error occur on your data?

@utpala101
Copy link

@mourisl Yes, I have the same error, and Chromap print that a temp sam file is missing, but the file is in the directory. So I don't know what's wrong with it.

@utpala101
Copy link

@xiadawei123 Thank you so much for your timely reply! I will further look for some way.

@mourisl
Copy link
Collaborator

mourisl commented Apr 30, 2024

@mourisl Yes, I have the same error, and Chromap print that a temp sam file is missing, but the file is in the directory. So I don't know what's wrong with it.

What is the file name? Is it empty?

@utpala101
Copy link

Sorry, I have deleted the file, but it was not empty. The file name is aligned.sam.temp1019

@mourisl
Copy link
Collaborator

mourisl commented May 9, 2024

Thank you for sharing the information. I think this may relate to the number of file handles a program can open on Linux machine, where the default is 1024 files. Considering the files for input and output, I think the 1019 temp files may reach the limit. We will add an option to specify the number of reads in each temp file so the number of temp files can be reduced.

@mourisl
Copy link
Collaborator

mourisl commented May 11, 2024

I have updated the code that will allow temp file to hold more reads when using too many temp files, though it may cause more memory usage. The updated code is in the li_dev7 branch, could you please checkout this branch and give it a try? Thank you!

@utpala101
Copy link

utpala101 commented May 22, 2024

@mourisl Sorry for delay, I have tested the new code but it still went error. The error messages are as followed


Mapped all reads in 41092.57s.
Number of reads: 3393404416.
Number of mapped reads: 2658488394.
Number of uniquely mapped reads: 1961435054.
Number of reads have multi-mappings: 697053340.
Number of candidates: 580287500208.
Number of mappings: 2658488394.
Number of uni-mappings: 1961435054.
Number of multi-mappings: 697053340.
Temporary file aligned.sam.temp1019 is missing.
chromap: src/temp_mapping.h:45: void chromap::TempMappingFileHandle::InitializeTempMappingLoading(uint32_t) [with MappingRecord = chromap::SAMMapping; uint32_t = unsigned int]: Assertion `file != __null' failed.
Aborted (core dumped)


My work directory had 1019 temp files which matched the error line, and the temp 1019 file size is much smaller than the former temp file. (1019 temp file is 96 MB and the former is about 960 MB). I am not sure whether my data have problems, but thank you for your work!

@mourisl
Copy link
Collaborator

mourisl commented May 22, 2024

Thank you for the testing! It is probably still my implementation error. I'll look into it.

@bioswarm
Copy link

@mourisl Sorry for delay, I have tested the new code but it still went error. The error messages are as followed

Mapped all reads in 41092.57s. Number of reads: 3393404416. Number of mapped reads: 2658488394. Number of uniquely mapped reads: 1961435054. Number of reads have multi-mappings: 697053340. Number of candidates: 580287500208. Number of mappings: 2658488394. Number of uni-mappings: 1961435054. Number of multi-mappings: 697053340. Temporary file aligned.sam.temp1019 is missing. chromap: src/temp_mapping.h:45: void chromap::TempMappingFileHandle::InitializeTempMappingLoading(uint32_t) [with MappingRecord = chromap::SAMMapping; uint32_t = unsigned int]: Assertion `file != __null' failed. Aborted (core dumped)

My work directory had 1019 temp files which matched the error line, and the temp 1019 file size is much smaller than the former temp file. (1019 temp file is 96 MB and the former is about 960 MB). I am not sure whether my data have problems, but thank you for your work!

You can try the command ”ulimit -n 4096“ in the node of your cluster.

@mourisl
Copy link
Collaborator

mourisl commented Jul 19, 2024

Sorry for the delayed reply @utpala101 . The branch's code should be able to handle 20 billion reads. I have updated the code in the li_dev7 branch that should allow more reads per temp file. The branch also adds a warning message whenever the temp file volume is increased for the debugging purpose. If you are still working on the data, could you please give it a try?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants