-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question about difference of detected cytosine sites between CX_report.txt and bismark.cov.gz #647
Comments
Dear Yingchuo Hu, I believe you are running into an issue that has come up before, and I have tried to summarise it here: https://felixkrueger.github.io/Bismark/faq/context_change/ Could you take a look and see it this answers your questions? |
Dear FelixKrueger, Thank you for your prompt and thoughtful response! This is quite intriguing! I have never thought the possibility that some detected sites have a different cytosine context between the coverage and genome-wide cytosine reports due to deletions. But, I still get confused about it. I mean, if a sequence has been successfully mapped, then the cytosine context is determined, so why would deletion happen? Could it be that mapping and methylation calling employ different strategies for context determination? Furthermore, even in cases where a detected site displays a disparate cytosine context between the coverage and genome-wide cytosine reports, the total number of detected cytosines in the CX_report should theoretically be the same with that in the coverage file. After all, cytosine in both reports extend to cytosines in any sequence context. The cytosine context should not affect the number of sites detection. Sorry to bother you again, but I'm still want to know where I might have made a misunderstanding. Sincerely, Yingchuo Hu |
I am not quite sure I understand the question "so why would deletion happen". Generally, the mapping is carried out first, and then the methylation call is performed afterwards.
In this example, M can be just any matching genomic base, but the read has a 3bp deletion in the middle. These things just happen. In case of the cytosine report the C would be in CHH context ( Regarding the second issue, I tend to agree that in non-CG context the coverage and CX reports should have the same number of positions. So I went to have a look at the code and spotted that there may have been a bug in non-CG mode that dates back a long time! I have now tried to change that (d90d47d), would you mind cloning the latest version and using the Kudos for spotting this, it might have gone unnoticed for a very long time, most likely because mammalian data is typically never run in non-CG mode...) |
Dear FelixKrueger, I apologize for the delay in my response. Thank you for your encouragement. Due to the special experiment design, we are curious about the modification of mammalian genome in the CH context . I've been struggled with the modified Bismark_methylation_extractor and have tried lots of ways to resolve the issue. Despite creating a new conda environment and cloning the latest I am reaching out to seek your expertise to identify any potential mistakes on my part or something else. Your insights on this matter would be highly valuable. Thank you for your time and patience. Best regards, Yingchuo Hu |
Hi FelixKrueger,
I am little confused about the number of detected Cytosine in
CX_report.txt
andbismark.cov.gz.
With the option
--CX
, the coverage file (bismark.cov.gz.
) is extended to cytosines in any sequence context, containing<chromosome> <start position> <end position> <methylation percentage> <count methylated> <count unmethylated>
.As to
CX_report.txt
, it contains<chromosome> <position> <strand> <count methylated> <count unmethylated> <C-context> <trinucleotide context>
and will output every cytosine in the whole genome irrespective of detected or not. But, when I filterCX_report.txt
by$4+$5 >0
, the detected Cytosine number in theCX_report.txt
is still not the same with the Cytosine number inbismark.cov.gz.
An example is listed here:
I checked the extra cytosine in
bismark.cov.gz.
and found it was the cytosine in mm10 genome. So, I am very confused why the number of detected cytosines inbismark.cov.gz.
is 5442 more than that inCX_report.txt
. I wonder if I misunderstood something. I am not sure whether I have expressed my thoughts clearly. Please let me know if you require any additional information to assist with your response.Thanks for your attention and patience!
Looking forward to your reply!
Best regards,
Yingchuo Hu
The text was updated successfully, but these errors were encountered: